SlideShare ist ein Scribd-Unternehmen logo
1 von 18
Downloaden Sie, um offline zu lesen
Data Science in
Ruby? Is it possible?
Is it Fast? Should we
use it?
• Rodrigo Urubatan
• rodrigo@urubatan.dev
• http://urubatan.dev
• http://twitter.com/urubatan
Anyone here work
with Data Science?
• DataScientist?
• DataEngineer?
• Developers of application that uses Data?
• Statisticians?
What exactly
is Data
Science?
The process of extractingmeaning from and interpret
data
The usage of statisticsand machine learning to clean
and manipulate data
The usage of computer software to collect, clean,
manipulate and interpret data
A cool name for the combination of Data Mining and
Business Intelligence (other buzz words thatwere used
for a long time for exactly what we call Data Science
today, but with more expensive tool sets)
Can Ruby do Data
Science?
Can Ruby do
Data Science?
(Long Answer)
• Standing on the shoulders of giants
• pycall — Bridgeinto the Python world.
• rserve-client— Ruby connector
for Rserve, R's binary server.
• Data Manipulation
• kiba — lightweight Ruby ETL (Extract-
Transform-Load) framework.
• jongleur — Workflowmanager using
DAG definitions to execute ETL tasks.
• Distributed Computing
• ruby-spark — Ruby Interface to Apache
Spark 1.x.x.
• Data Structures
• daru — Data Frame and Vector
structures with comprehensive
manipulatingand visualization
methods.
• numo-narray — n-dimensional
Numerical Array for Ruby.
• nmatrix — dense and sparselinear
algebra library for Ruby via SciRuby.
• Datasets
• rdatasets — Data sets available in R
via Rdatasets.
• red-datasets — Growing collectionof publicly
available data sets suchas CIFAR-10,Iris,MNIST
etc
• Statistics
• rb-gsl — Ruby interfacetotheGNU Scientific
Library. [dep: GLS]
• simple_stats — Enumerablepatches for
descriptive statistics.
• enumerable-statistics — fastimplementation of
descriptive statistics for theEnumerablemodule.
• Visualization
• matplotlib — Rubybased wrapper
around matplotlib. [dep: matplotlib]
• mathematical — PNGand MathML renderings for
your equations.
• daru-view — daru-view is interactive plotting
gem for web application (any Ruby web
applicationframework like
Rails/Sinatra/Nanoc/Hanami) &IRubynotebook.
It is a plugin gemfor daru.
• daru-plotly — Plotly basedvisualization for Daru.
Ruby
X
Python
Ruby
Daru
NMatrix/NArray
Python
Pandas
Numpy
The 3 Major
Ruby Data
Science
Projects
SciRuby project
Nmatrix Centric gems
Nmatrix
Daru
GnuplotRB
Stas_sample
Ruby Numo project
Numo:: NArray centric Gems
Numo:: NArray
Numo:: FFTE
Numo:: FFTW
Numo::Gnuplot
RedDataToolsproject
Apache Arrow centric gems
RedArrow
RedChainer
RedArrowGSL
RedArrowNMatrix
RedArrowNumoNArray
Other libraries
• ruby-spark — RubyInterface to Apache Spark 1.x.x.
• The project is almost dead,not commits in ages
• kiba — lightweight RubyETL (Extract-Transform-Load)framework.
• Great frameworkto load and transformdata,great performance
• enumerable-statistics — fast implementation ofdescriptivestatistics for the
Enumerable module.
• Very handyforsmall statisticalcalculations in yourapplication
• iruby — Rubykernel for Jupyter.
• The easiest wayto use Ruby in your Jupyter Notebook
• decisiontree - Decision Tree ID3 Algorithmin pure Ruby
• Easydecision tree implementation,and a very fast to train
Doing data science in
Ruby is Hard!
Ruby and Ruby on Rails are
way better to write business
web applications!
We can even do
really good Machine
Learning with Ruby
(but that is subject
for another
presentation)
And my objective is to
help ruby developers to use
the best tools for each job so
they can solve hard
problems, with less bugs and
have more free time.
pycall to the
rescue
pycall lets you use Python libraries from
your ruby code very naturally, as if you
were calling a Ruby library
pycall consists of one ruby binding
library for libpython.so and an Object-
oriented protocol for communication
between Ruby and Python
Simple pycall
code
What about some
light machine
learning? Do I
need python too?
Ok, so what
are the best
work
patterns?
Python is way better than Ruby for
Data Science
Ruby is better for web business
applications
Best patterns for integration are
(IMHO)
• Pointing both applications to the same
database
• Exchanging data through JSON or some similar
serialization
• Calling Python directly through pycall
References
• Ruby Conf 2017 – Using Ruby in DataScience by Kenta Murata (@mrkn)
• Big Data analysis in Ruby
• Lets do some (Data) Science in Ruby by Dan Carpenter (@dan_alyst)
• Progress of Ruby/Numo: Numerical Computing for Ruby
• SciRuby
• Ruby::Numo
• Ruby Machine Learning resources
• Ruby Data Science Resources
• PyCall
Any questions? Talk to
me!
• @urubatan
• https://urubatan.dev
• rodrigo@urubatan.dev

Weitere ähnliche Inhalte

Was ist angesagt?

Curse of Cardinality: A History and Evolution of Monitoring at Scale
Curse of Cardinality: A History and Evolution of Monitoring at ScaleCurse of Cardinality: A History and Evolution of Monitoring at Scale
Curse of Cardinality: A History and Evolution of Monitoring at ScaleMichael Goodness
 
Python for Financial Data Analysis with pandas
Python for Financial Data Analysis with pandasPython for Financial Data Analysis with pandas
Python for Financial Data Analysis with pandasWes McKinney
 
Boost on!!next generation big data platform
Boost on!!next generation big data platformBoost on!!next generation big data platform
Boost on!!next generation big data platformLINE Corporation
 
Predictive Models at Scale
Predictive Models at ScalePredictive Models at Scale
Predictive Models at ScaleNikhil Ketkar
 
Graph Processing with Apache TinkerPop
Graph Processing with Apache TinkerPopGraph Processing with Apache TinkerPop
Graph Processing with Apache TinkerPopJason Plurad
 
Data Visualization in Python
Data Visualization in PythonData Visualization in Python
Data Visualization in PythonJagriti Goswami
 
Pinot: Near Realtime Analytics @ Uber
Pinot: Near Realtime Analytics @ UberPinot: Near Realtime Analytics @ Uber
Pinot: Near Realtime Analytics @ UberXiang Fu
 
Case study- Real-time OLAP Cubes
Case study- Real-time OLAP Cubes Case study- Real-time OLAP Cubes
Case study- Real-time OLAP Cubes Ziemowit Jankowski
 
Graph Computing with JanusGraph
Graph Computing with JanusGraphGraph Computing with JanusGraph
Graph Computing with JanusGraphJason Plurad
 
Boolan machine learning summit
Boolan machine learning summitBoolan machine learning summit
Boolan machine learning summitAdam Gibson
 
Joker'14 Java as a fundamental working tool of the Data Scientist
Joker'14 Java as a fundamental working tool of the Data ScientistJoker'14 Java as a fundamental working tool of the Data Scientist
Joker'14 Java as a fundamental working tool of the Data ScientistAlexey Zinoviev
 
Implementing BigPetStore with Apache Flink
Implementing BigPetStore with Apache FlinkImplementing BigPetStore with Apache Flink
Implementing BigPetStore with Apache FlinkMárton Balassi
 
MongoDB Versatility: Scaling the MapMyFitness Platform
MongoDB Versatility: Scaling the MapMyFitness PlatformMongoDB Versatility: Scaling the MapMyFitness Platform
MongoDB Versatility: Scaling the MapMyFitness PlatformMongoDB
 
Thorny path to the Large-Scale Graph Processing (Highload++, 2014)
Thorny path to the Large-Scale Graph Processing (Highload++, 2014)Thorny path to the Large-Scale Graph Processing (Highload++, 2014)
Thorny path to the Large-Scale Graph Processing (Highload++, 2014)Alexey Zinoviev
 
JanusGraph, Jupyter Meetup NYC
JanusGraph, Jupyter Meetup NYCJanusGraph, Jupyter Meetup NYC
JanusGraph, Jupyter Meetup NYCJason Plurad
 
Exploring Graph Use Cases with JanusGraph
Exploring Graph Use Cases with JanusGraphExploring Graph Use Cases with JanusGraph
Exploring Graph Use Cases with JanusGraphJason Plurad
 
Airline Reservations and Routing: A Graph Use Case
Airline Reservations and Routing: A Graph Use CaseAirline Reservations and Routing: A Graph Use Case
Airline Reservations and Routing: A Graph Use CaseJason Plurad
 
Pycon tw 2013
Pycon tw 2013Pycon tw 2013
Pycon tw 2013show you
 
Machine learning and big data @ uber a tale of two systems
Machine learning and big data @ uber a tale of two systemsMachine learning and big data @ uber a tale of two systems
Machine learning and big data @ uber a tale of two systemsZhenxiao Luo
 

Was ist angesagt? (20)

Curse of Cardinality: A History and Evolution of Monitoring at Scale
Curse of Cardinality: A History and Evolution of Monitoring at ScaleCurse of Cardinality: A History and Evolution of Monitoring at Scale
Curse of Cardinality: A History and Evolution of Monitoring at Scale
 
Python for Financial Data Analysis with pandas
Python for Financial Data Analysis with pandasPython for Financial Data Analysis with pandas
Python for Financial Data Analysis with pandas
 
Boost on!!next generation big data platform
Boost on!!next generation big data platformBoost on!!next generation big data platform
Boost on!!next generation big data platform
 
Predictive Models at Scale
Predictive Models at ScalePredictive Models at Scale
Predictive Models at Scale
 
Graph Processing with Apache TinkerPop
Graph Processing with Apache TinkerPopGraph Processing with Apache TinkerPop
Graph Processing with Apache TinkerPop
 
Data Visualization in Python
Data Visualization in PythonData Visualization in Python
Data Visualization in Python
 
Pinot: Near Realtime Analytics @ Uber
Pinot: Near Realtime Analytics @ UberPinot: Near Realtime Analytics @ Uber
Pinot: Near Realtime Analytics @ Uber
 
Case study- Real-time OLAP Cubes
Case study- Real-time OLAP Cubes Case study- Real-time OLAP Cubes
Case study- Real-time OLAP Cubes
 
Graph Computing with JanusGraph
Graph Computing with JanusGraphGraph Computing with JanusGraph
Graph Computing with JanusGraph
 
Boolan machine learning summit
Boolan machine learning summitBoolan machine learning summit
Boolan machine learning summit
 
Joker'14 Java as a fundamental working tool of the Data Scientist
Joker'14 Java as a fundamental working tool of the Data ScientistJoker'14 Java as a fundamental working tool of the Data Scientist
Joker'14 Java as a fundamental working tool of the Data Scientist
 
Implementing BigPetStore with Apache Flink
Implementing BigPetStore with Apache FlinkImplementing BigPetStore with Apache Flink
Implementing BigPetStore with Apache Flink
 
MongoDB Versatility: Scaling the MapMyFitness Platform
MongoDB Versatility: Scaling the MapMyFitness PlatformMongoDB Versatility: Scaling the MapMyFitness Platform
MongoDB Versatility: Scaling the MapMyFitness Platform
 
Thorny path to the Large-Scale Graph Processing (Highload++, 2014)
Thorny path to the Large-Scale Graph Processing (Highload++, 2014)Thorny path to the Large-Scale Graph Processing (Highload++, 2014)
Thorny path to the Large-Scale Graph Processing (Highload++, 2014)
 
JanusGraph, Jupyter Meetup NYC
JanusGraph, Jupyter Meetup NYCJanusGraph, Jupyter Meetup NYC
JanusGraph, Jupyter Meetup NYC
 
Exploring Graph Use Cases with JanusGraph
Exploring Graph Use Cases with JanusGraphExploring Graph Use Cases with JanusGraph
Exploring Graph Use Cases with JanusGraph
 
F# for Data*
F# for Data*F# for Data*
F# for Data*
 
Airline Reservations and Routing: A Graph Use Case
Airline Reservations and Routing: A Graph Use CaseAirline Reservations and Routing: A Graph Use Case
Airline Reservations and Routing: A Graph Use Case
 
Pycon tw 2013
Pycon tw 2013Pycon tw 2013
Pycon tw 2013
 
Machine learning and big data @ uber a tale of two systems
Machine learning and big data @ uber a tale of two systemsMachine learning and big data @ uber a tale of two systems
Machine learning and big data @ uber a tale of two systems
 

Ähnlich wie Data science in ruby, is it possible? is it fast? should we use it?

Keynote at Converge 2019
Keynote at Converge 2019Keynote at Converge 2019
Keynote at Converge 2019Travis Oliphant
 
Apache Spark for Everyone - Women Who Code Workshop
Apache Spark for Everyone - Women Who Code WorkshopApache Spark for Everyone - Women Who Code Workshop
Apache Spark for Everyone - Women Who Code WorkshopAmanda Casari
 
Machine learning model to production
Machine learning model to productionMachine learning model to production
Machine learning model to productionGeorg Heiler
 
Knowledge Graphs - Journey to the Connected Enterprise - Data Strategy and An...
Knowledge Graphs - Journey to the Connected Enterprise - Data Strategy and An...Knowledge Graphs - Journey to the Connected Enterprise - Data Strategy and An...
Knowledge Graphs - Journey to the Connected Enterprise - Data Strategy and An...Benjamin Nussbaum
 
Big Data Analytics (ML, DL, AI) hands-on
Big Data Analytics (ML, DL, AI) hands-onBig Data Analytics (ML, DL, AI) hands-on
Big Data Analytics (ML, DL, AI) hands-onDony Riyanto
 
An Incomplete Data Tools Landscape for Hackers in 2015
An Incomplete Data Tools Landscape for Hackers in 2015An Incomplete Data Tools Landscape for Hackers in 2015
An Incomplete Data Tools Landscape for Hackers in 2015Wes McKinney
 
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...Perficient, Inc.
 
ROMA User-Customizable NoSQL Database in Ruby
ROMA User-Customizable NoSQL Database in RubyROMA User-Customizable NoSQL Database in Ruby
ROMA User-Customizable NoSQL Database in RubyRakuten Group, Inc.
 
Data Stream Algorithms in Storm and R
Data Stream Algorithms in Storm and RData Stream Algorithms in Storm and R
Data Stream Algorithms in Storm and RRadek Maciaszek
 
Getting Started with Big Data in the Cloud
Getting Started with Big Data in the CloudGetting Started with Big Data in the Cloud
Getting Started with Big Data in the CloudRightScale
 
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise NetworksUsing Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise NetworksDataWorks Summit
 
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise NetworksUsing Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise NetworksMapR Technologies
 
Teradata Partners Conference Oct 2014 Big Data Anti-Patterns
Teradata Partners Conference Oct 2014   Big Data Anti-PatternsTeradata Partners Conference Oct 2014   Big Data Anti-Patterns
Teradata Partners Conference Oct 2014 Big Data Anti-PatternsDouglas Moore
 
"R, Hadoop, and Amazon Web Services (20 December 2011)"
"R, Hadoop, and Amazon Web Services (20 December 2011)""R, Hadoop, and Amazon Web Services (20 December 2011)"
"R, Hadoop, and Amazon Web Services (20 December 2011)"Portland R User Group
 

Ähnlich wie Data science in ruby, is it possible? is it fast? should we use it? (20)

Session 2
Session 2Session 2
Session 2
 
Keynote at Converge 2019
Keynote at Converge 2019Keynote at Converge 2019
Keynote at Converge 2019
 
PyData Boston 2013
PyData Boston 2013PyData Boston 2013
PyData Boston 2013
 
Python ml
Python mlPython ml
Python ml
 
Apache Spark for Everyone - Women Who Code Workshop
Apache Spark for Everyone - Women Who Code WorkshopApache Spark for Everyone - Women Who Code Workshop
Apache Spark for Everyone - Women Who Code Workshop
 
Machine learning model to production
Machine learning model to productionMachine learning model to production
Machine learning model to production
 
Knowledge Graphs - Journey to the Connected Enterprise - Data Strategy and An...
Knowledge Graphs - Journey to the Connected Enterprise - Data Strategy and An...Knowledge Graphs - Journey to the Connected Enterprise - Data Strategy and An...
Knowledge Graphs - Journey to the Connected Enterprise - Data Strategy and An...
 
Big Data Analytics (ML, DL, AI) hands-on
Big Data Analytics (ML, DL, AI) hands-onBig Data Analytics (ML, DL, AI) hands-on
Big Data Analytics (ML, DL, AI) hands-on
 
An Incomplete Data Tools Landscape for Hackers in 2015
An Incomplete Data Tools Landscape for Hackers in 2015An Incomplete Data Tools Landscape for Hackers in 2015
An Incomplete Data Tools Landscape for Hackers in 2015
 
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
 
ROMA User-Customizable NoSQL Database in Ruby
ROMA User-Customizable NoSQL Database in RubyROMA User-Customizable NoSQL Database in Ruby
ROMA User-Customizable NoSQL Database in Ruby
 
Data Stream Algorithms in Storm and R
Data Stream Algorithms in Storm and RData Stream Algorithms in Storm and R
Data Stream Algorithms in Storm and R
 
Getting Started with Big Data in the Cloud
Getting Started with Big Data in the CloudGetting Started with Big Data in the Cloud
Getting Started with Big Data in the Cloud
 
Architecting Your First Big Data Implementation
Architecting Your First Big Data ImplementationArchitecting Your First Big Data Implementation
Architecting Your First Big Data Implementation
 
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise NetworksUsing Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
 
Big data analytics using R
Big data analytics using RBig data analytics using R
Big data analytics using R
 
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise NetworksUsing Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
 
Teradata Partners Conference Oct 2014 Big Data Anti-Patterns
Teradata Partners Conference Oct 2014   Big Data Anti-PatternsTeradata Partners Conference Oct 2014   Big Data Anti-Patterns
Teradata Partners Conference Oct 2014 Big Data Anti-Patterns
 
"R, Hadoop, and Amazon Web Services (20 December 2011)"
"R, Hadoop, and Amazon Web Services (20 December 2011)""R, Hadoop, and Amazon Web Services (20 December 2011)"
"R, Hadoop, and Amazon Web Services (20 December 2011)"
 
R, Hadoop and Amazon Web Services
R, Hadoop and Amazon Web ServicesR, Hadoop and Amazon Web Services
R, Hadoop and Amazon Web Services
 

Mehr von Rodrigo Urubatan

2018 the conf put git to work - increase the quality of your rails project...
2018 the conf   put git to work -  increase the quality of your rails project...2018 the conf   put git to work -  increase the quality of your rails project...
2018 the conf put git to work - increase the quality of your rails project...Rodrigo Urubatan
 
2018 RubyHACK: put git to work - increase the quality of your rails project...
2018 RubyHACK:  put git to work -  increase the quality of your rails project...2018 RubyHACK:  put git to work -  increase the quality of your rails project...
2018 RubyHACK: put git to work - increase the quality of your rails project...Rodrigo Urubatan
 
TDC2017 - POA - Aprendendo a usar Xamarin para desenvolver aplicações moveis ...
TDC2017 - POA - Aprendendo a usar Xamarin para desenvolver aplicações moveis ...TDC2017 - POA - Aprendendo a usar Xamarin para desenvolver aplicações moveis ...
TDC2017 - POA - Aprendendo a usar Xamarin para desenvolver aplicações moveis ...Rodrigo Urubatan
 
Your first game with unity3d framework
Your first game with unity3d frameworkYour first game with unity3d framework
Your first game with unity3d frameworkRodrigo Urubatan
 
Tdc Floripa 2017 - 8 falácias da programação distribuída
Tdc Floripa 2017 -  8 falácias da programação distribuídaTdc Floripa 2017 -  8 falácias da programação distribuída
Tdc Floripa 2017 - 8 falácias da programação distribuídaRodrigo Urubatan
 
Rubyconf2016 - Solving communication problems in distributed teams with BDD
Rubyconf2016 - Solving communication problems in distributed teams with BDDRubyconf2016 - Solving communication problems in distributed teams with BDD
Rubyconf2016 - Solving communication problems in distributed teams with BDDRodrigo Urubatan
 
resolvendo problemas de comunicação em equipes distribuídas com bdd
resolvendo problemas de comunicação em equipes distribuídas com bddresolvendo problemas de comunicação em equipes distribuídas com bdd
resolvendo problemas de comunicação em equipes distribuídas com bddRodrigo Urubatan
 
vantagens e desvantagens de trabalhar remoto
vantagens e desvantagens de trabalhar remotovantagens e desvantagens de trabalhar remoto
vantagens e desvantagens de trabalhar remotoRodrigo Urubatan
 
Using BDD to Solve communication problems
Using BDD to Solve communication problemsUsing BDD to Solve communication problems
Using BDD to Solve communication problemsRodrigo Urubatan
 
TDC2015 Porto Alegre - Interfaces ricas com Rails e React.JS
TDC2015  Porto Alegre - Interfaces ricas com Rails e React.JSTDC2015  Porto Alegre - Interfaces ricas com Rails e React.JS
TDC2015 Porto Alegre - Interfaces ricas com Rails e React.JSRodrigo Urubatan
 
Interfaces ricas com Rails e React.JS @ Rubyconf 2015
Interfaces ricas com Rails e React.JS @ Rubyconf 2015Interfaces ricas com Rails e React.JS @ Rubyconf 2015
Interfaces ricas com Rails e React.JS @ Rubyconf 2015Rodrigo Urubatan
 
TDC São Paulo 2015 - Interfaces Ricas com Rails e React.JS
TDC São Paulo 2015  - Interfaces Ricas com Rails e React.JSTDC São Paulo 2015  - Interfaces Ricas com Rails e React.JS
TDC São Paulo 2015 - Interfaces Ricas com Rails e React.JSRodrigo Urubatan
 
Full Text Search com Solr, MySQL Full text e PostgreSQL Full Text
Full Text Search com Solr, MySQL Full text e PostgreSQL Full TextFull Text Search com Solr, MySQL Full text e PostgreSQL Full Text
Full Text Search com Solr, MySQL Full text e PostgreSQL Full TextRodrigo Urubatan
 
Ruby para programadores java
Ruby para programadores javaRuby para programadores java
Ruby para programadores javaRodrigo Urubatan
 
Treinamento html5, css e java script apresentado na HP
Treinamento html5, css e java script apresentado na HPTreinamento html5, css e java script apresentado na HP
Treinamento html5, css e java script apresentado na HPRodrigo Urubatan
 
Ruby on rails impressione a você mesmo, seu chefe e seu cliente
Ruby on rails  impressione a você mesmo, seu chefe e seu clienteRuby on rails  impressione a você mesmo, seu chefe e seu cliente
Ruby on rails impressione a você mesmo, seu chefe e seu clienteRodrigo Urubatan
 
Aplicações Hibridas com Phonegap e HTML5
Aplicações Hibridas com Phonegap e HTML5Aplicações Hibridas com Phonegap e HTML5
Aplicações Hibridas com Phonegap e HTML5Rodrigo Urubatan
 
Git presentation to some coworkers some time ago
Git presentation to some coworkers some time agoGit presentation to some coworkers some time ago
Git presentation to some coworkers some time agoRodrigo Urubatan
 

Mehr von Rodrigo Urubatan (20)

Ruby code smells
Ruby code smellsRuby code smells
Ruby code smells
 
2018 the conf put git to work - increase the quality of your rails project...
2018 the conf   put git to work -  increase the quality of your rails project...2018 the conf   put git to work -  increase the quality of your rails project...
2018 the conf put git to work - increase the quality of your rails project...
 
2018 RubyHACK: put git to work - increase the quality of your rails project...
2018 RubyHACK:  put git to work -  increase the quality of your rails project...2018 RubyHACK:  put git to work -  increase the quality of your rails project...
2018 RubyHACK: put git to work - increase the quality of your rails project...
 
TDC2017 - POA - Aprendendo a usar Xamarin para desenvolver aplicações moveis ...
TDC2017 - POA - Aprendendo a usar Xamarin para desenvolver aplicações moveis ...TDC2017 - POA - Aprendendo a usar Xamarin para desenvolver aplicações moveis ...
TDC2017 - POA - Aprendendo a usar Xamarin para desenvolver aplicações moveis ...
 
Your first game with unity3d framework
Your first game with unity3d frameworkYour first game with unity3d framework
Your first game with unity3d framework
 
Tdc Floripa 2017 - 8 falácias da programação distribuída
Tdc Floripa 2017 -  8 falácias da programação distribuídaTdc Floripa 2017 -  8 falácias da programação distribuída
Tdc Floripa 2017 - 8 falácias da programação distribuída
 
Rubyconf2016 - Solving communication problems in distributed teams with BDD
Rubyconf2016 - Solving communication problems in distributed teams with BDDRubyconf2016 - Solving communication problems in distributed teams with BDD
Rubyconf2016 - Solving communication problems in distributed teams with BDD
 
resolvendo problemas de comunicação em equipes distribuídas com bdd
resolvendo problemas de comunicação em equipes distribuídas com bddresolvendo problemas de comunicação em equipes distribuídas com bdd
resolvendo problemas de comunicação em equipes distribuídas com bdd
 
vantagens e desvantagens de trabalhar remoto
vantagens e desvantagens de trabalhar remotovantagens e desvantagens de trabalhar remoto
vantagens e desvantagens de trabalhar remoto
 
Using BDD to Solve communication problems
Using BDD to Solve communication problemsUsing BDD to Solve communication problems
Using BDD to Solve communication problems
 
TDC2015 Porto Alegre - Interfaces ricas com Rails e React.JS
TDC2015  Porto Alegre - Interfaces ricas com Rails e React.JSTDC2015  Porto Alegre - Interfaces ricas com Rails e React.JS
TDC2015 Porto Alegre - Interfaces ricas com Rails e React.JS
 
Interfaces ricas com Rails e React.JS @ Rubyconf 2015
Interfaces ricas com Rails e React.JS @ Rubyconf 2015Interfaces ricas com Rails e React.JS @ Rubyconf 2015
Interfaces ricas com Rails e React.JS @ Rubyconf 2015
 
TDC São Paulo 2015 - Interfaces Ricas com Rails e React.JS
TDC São Paulo 2015  - Interfaces Ricas com Rails e React.JSTDC São Paulo 2015  - Interfaces Ricas com Rails e React.JS
TDC São Paulo 2015 - Interfaces Ricas com Rails e React.JS
 
Full Text Search com Solr, MySQL Full text e PostgreSQL Full Text
Full Text Search com Solr, MySQL Full text e PostgreSQL Full TextFull Text Search com Solr, MySQL Full text e PostgreSQL Full Text
Full Text Search com Solr, MySQL Full text e PostgreSQL Full Text
 
Ruby para programadores java
Ruby para programadores javaRuby para programadores java
Ruby para programadores java
 
Treinamento html5, css e java script apresentado na HP
Treinamento html5, css e java script apresentado na HPTreinamento html5, css e java script apresentado na HP
Treinamento html5, css e java script apresentado na HP
 
Ruby on rails impressione a você mesmo, seu chefe e seu cliente
Ruby on rails  impressione a você mesmo, seu chefe e seu clienteRuby on rails  impressione a você mesmo, seu chefe e seu cliente
Ruby on rails impressione a você mesmo, seu chefe e seu cliente
 
Mini curso rails 3
Mini curso rails 3Mini curso rails 3
Mini curso rails 3
 
Aplicações Hibridas com Phonegap e HTML5
Aplicações Hibridas com Phonegap e HTML5Aplicações Hibridas com Phonegap e HTML5
Aplicações Hibridas com Phonegap e HTML5
 
Git presentation to some coworkers some time ago
Git presentation to some coworkers some time agoGit presentation to some coworkers some time ago
Git presentation to some coworkers some time ago
 

Kürzlich hochgeladen

Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 

Kürzlich hochgeladen (20)

Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 

Data science in ruby, is it possible? is it fast? should we use it?

  • 1. Data Science in Ruby? Is it possible? Is it Fast? Should we use it? • Rodrigo Urubatan • rodrigo@urubatan.dev • http://urubatan.dev • http://twitter.com/urubatan
  • 2. Anyone here work with Data Science? • DataScientist? • DataEngineer? • Developers of application that uses Data? • Statisticians?
  • 3. What exactly is Data Science? The process of extractingmeaning from and interpret data The usage of statisticsand machine learning to clean and manipulate data The usage of computer software to collect, clean, manipulate and interpret data A cool name for the combination of Data Mining and Business Intelligence (other buzz words thatwere used for a long time for exactly what we call Data Science today, but with more expensive tool sets)
  • 4. Can Ruby do Data Science?
  • 5. Can Ruby do Data Science? (Long Answer) • Standing on the shoulders of giants • pycall — Bridgeinto the Python world. • rserve-client— Ruby connector for Rserve, R's binary server. • Data Manipulation • kiba — lightweight Ruby ETL (Extract- Transform-Load) framework. • jongleur — Workflowmanager using DAG definitions to execute ETL tasks. • Distributed Computing • ruby-spark — Ruby Interface to Apache Spark 1.x.x. • Data Structures • daru — Data Frame and Vector structures with comprehensive manipulatingand visualization methods. • numo-narray — n-dimensional Numerical Array for Ruby. • nmatrix — dense and sparselinear algebra library for Ruby via SciRuby. • Datasets • rdatasets — Data sets available in R via Rdatasets. • red-datasets — Growing collectionof publicly available data sets suchas CIFAR-10,Iris,MNIST etc • Statistics • rb-gsl — Ruby interfacetotheGNU Scientific Library. [dep: GLS] • simple_stats — Enumerablepatches for descriptive statistics. • enumerable-statistics — fastimplementation of descriptive statistics for theEnumerablemodule. • Visualization • matplotlib — Rubybased wrapper around matplotlib. [dep: matplotlib] • mathematical — PNGand MathML renderings for your equations. • daru-view — daru-view is interactive plotting gem for web application (any Ruby web applicationframework like Rails/Sinatra/Nanoc/Hanami) &IRubynotebook. It is a plugin gemfor daru. • daru-plotly — Plotly basedvisualization for Daru.
  • 7. The 3 Major Ruby Data Science Projects SciRuby project Nmatrix Centric gems Nmatrix Daru GnuplotRB Stas_sample Ruby Numo project Numo:: NArray centric Gems Numo:: NArray Numo:: FFTE Numo:: FFTW Numo::Gnuplot RedDataToolsproject Apache Arrow centric gems RedArrow RedChainer RedArrowGSL RedArrowNMatrix RedArrowNumoNArray
  • 8. Other libraries • ruby-spark — RubyInterface to Apache Spark 1.x.x. • The project is almost dead,not commits in ages • kiba — lightweight RubyETL (Extract-Transform-Load)framework. • Great frameworkto load and transformdata,great performance • enumerable-statistics — fast implementation ofdescriptivestatistics for the Enumerable module. • Very handyforsmall statisticalcalculations in yourapplication • iruby — Rubykernel for Jupyter. • The easiest wayto use Ruby in your Jupyter Notebook • decisiontree - Decision Tree ID3 Algorithmin pure Ruby • Easydecision tree implementation,and a very fast to train
  • 9. Doing data science in Ruby is Hard!
  • 10. Ruby and Ruby on Rails are way better to write business web applications!
  • 11. We can even do really good Machine Learning with Ruby (but that is subject for another presentation)
  • 12. And my objective is to help ruby developers to use the best tools for each job so they can solve hard problems, with less bugs and have more free time.
  • 13. pycall to the rescue pycall lets you use Python libraries from your ruby code very naturally, as if you were calling a Ruby library pycall consists of one ruby binding library for libpython.so and an Object- oriented protocol for communication between Ruby and Python
  • 15. What about some light machine learning? Do I need python too?
  • 16. Ok, so what are the best work patterns? Python is way better than Ruby for Data Science Ruby is better for web business applications Best patterns for integration are (IMHO) • Pointing both applications to the same database • Exchanging data through JSON or some similar serialization • Calling Python directly through pycall
  • 17. References • Ruby Conf 2017 – Using Ruby in DataScience by Kenta Murata (@mrkn) • Big Data analysis in Ruby • Lets do some (Data) Science in Ruby by Dan Carpenter (@dan_alyst) • Progress of Ruby/Numo: Numerical Computing for Ruby • SciRuby • Ruby::Numo • Ruby Machine Learning resources • Ruby Data Science Resources • PyCall
  • 18. Any questions? Talk to me! • @urubatan • https://urubatan.dev • rodrigo@urubatan.dev