SlideShare ist ein Scribd-Unternehmen logo
1 von 46
Downloaden Sie, um offline zu lesen
Data Science With R
~ for ~
Java Developers
@Sander_Mak
Agenda
Data Science
The R language
Gimme some Java!
1
1
1
1
11
1
1
0
0
0
0
0
0
90% of the world’s data was
produced in the last 2 years
- SINTEF/ScienceDaily June 2013!!!!!!!!
We need more than
just CRUD
Stand back.
I know Data Science!
Software
Engineering
Domain
Expertise
Math &
Statistics
Data
Science
Machine
Learning
Operations
Research
Danger!
Perl ahead!
Software
Engineering
Domain
Expertise
Math &
Statistics
Data
Science
Machine
Learning
Operations
Research
Danger!
Perl ahead!
Data Science:
Achievement Unlocked
R, R-Studio
Today
Data Science:
Achievement Unlocked
Agenda
Data Science
The R language
Gimme some Java!
1
1
1 1
11
1
1
0
0
0
0
0
0
Language
Designers
Statisticians
Language
Designers?
Statisticians?
The best thing about R is that it was developed by statisticians. The
worst thing about R is that... it was developed by statisticians.
- Bo Cowgill, Google
Why R, then?
Open Source
De-facto standard (in statistical research)
“It’s a DSL posing as general purpose language”
Interactive data exploration
Why not R, then?
Slow
Memory Bound
(Did I mention it’s a quirky language?)
Try googling for R...
Why not R, then?
‘If you are using R and you think
you’re in hell, this is a map for you.’
- The R Inferno
Slow
Memory Bound
(Did I mention it’s a quirky language?)
Try googling for R...
Apparently, statisticians aren’t designers, either...
VS
Dynamic (eval)
Interpreted
Static types
Compiled
Functional/OO/Procedural OO
Factor Enum
numeric
character String
Integer/Double/...
Factor Enum
numeric
character String
vector
list
dataframe
Integer/Double/...
1-based 0-based1
2
3
4
0
1
2
3
1-based 0-based1
2
3
4
0
1
2
3
for-loops
higher-order functions
sapply(vec, function(elm) {
elm + 1;
})
eager evalutationlazy evaluation
eager evalutationlazy evaluation
pass-by-value
(copy-on-write)
pass-by-reference
Function F
Value A Value A
Value A’
call F(A) modify
Studio
Central
Comprehensive
R
Archive
Network
Studio
Coding time!
Titanic Competition:
Machine Learning from Disaster
Titanic Competition:
Machine Learning from Disaster
Titanic Competition:
Machine Learning from Disaster
Sex == Female
Decision Tree
Age > 50Age > 16
Fare > 100
T FT T F
Titanic Competition:
Machine Learning from Disaster
Sex == Female
Decision Tree
Age > 50Age > 16
Random Forest
Fare > 100
T FT T F
T
FT T FT
FT T F
T
FT T FT
FT T F
Demo time!
...
...
Agenda
Data Science
The R language
Gimme some Java!
1
1
1 1
11
1
1
0
0
0
0
0
0
Bridging R and Java
Integrate
Assimilate
Replace
rJava & Java/R interface
Integrate
Two way native interface
- JNI: libjri
- or TCP to RServe
Rengine re = new Rengine(new String[] {}, false, null);
// wait until engine is ready
if (!re.waitForR()) {
throw new IllegalStateException(“Can’t load R engine”);
}
re.eval("data(cars)", false);
REXP cars = re.eval("cars");
RVector carsVector = cars.asVector();
// dissect carsVector...
Assimilate
Reimplementation of R on JVM
Fast & lean
Parallelized
Just-another-lib
... not production ready yet...
Assimilate
// create a script engine manager
ScriptEngineManager factory =
new ScriptEngineManager();
// create an R engine
ScriptEngine engine =
factory.getEngineByName("Renjin");
// load package from classpath
engine.eval(“library(survey)");
// evaluate R code from String
engine.eval("print('Hello from R')");
Reimplementation of R on JVM
Fast & lean
Parallelized
Just-another-lib
... not production ready yet...
Reimplementation of R on JVM
Share data:
Integer[] data = {1, 2, 3};
engine.put("data", data);
engine.eval("print(sum(data))");
Assimilate
Reimplementation of R on JVM
Share data:
import(com.foo.User)
# instantiate Java beans
tim <- User$new(name='Tim', age=23)
tom <- User$new(name='Tom', age=45)
# invoke setter
tim$name <- "Timmy"
Use Java from Renjin:
Integer[] data = {1, 2, 3};
engine.put("data", data);
engine.eval("print(sum(data))");
Assimilate
Big Data?
Replace
JVM Libraries/platforms
Replace
Scalable R distributions
(non-JVM)
Revolution Analytics
Oracle Enterprise R
Wrap-up
Data Science
The R language
Gimme some Java!
1
1
1 1
11
1
1
0
0
0
0
0
0
Sanitize
Explore
Model Predict
Scale
Next steps
Computing for Data Analysis
starts Sept. 23rd
Install R Read
Questions?
Data Science
The R language
Gimme some Java!
1
1
1 1
1
1
110
0
0
0
0
0
@Sander_Mak
branchandbound.net

Weitere ähnliche Inhalte

Mehr von Sander Mak (@Sander_Mak)

Modules or microservices?
Modules or microservices?Modules or microservices?
Modules or microservices?
Sander Mak (@Sander_Mak)
 

Mehr von Sander Mak (@Sander_Mak) (20)

Modules or microservices?
Modules or microservices?Modules or microservices?
Modules or microservices?
 
Migrating to Java 9 Modules
Migrating to Java 9 ModulesMigrating to Java 9 Modules
Migrating to Java 9 Modules
 
Java 9 Modularity in Action
Java 9 Modularity in ActionJava 9 Modularity in Action
Java 9 Modularity in Action
 
Java modularity: life after Java 9
Java modularity: life after Java 9Java modularity: life after Java 9
Java modularity: life after Java 9
 
Provisioning the IoT
Provisioning the IoTProvisioning the IoT
Provisioning the IoT
 
Event-sourced architectures with Akka
Event-sourced architectures with AkkaEvent-sourced architectures with Akka
Event-sourced architectures with Akka
 
TypeScript: coding JavaScript without the pain
TypeScript: coding JavaScript without the painTypeScript: coding JavaScript without the pain
TypeScript: coding JavaScript without the pain
 
The Ultimate Dependency Manager Shootout (QCon NY 2014)
The Ultimate Dependency Manager Shootout (QCon NY 2014)The Ultimate Dependency Manager Shootout (QCon NY 2014)
The Ultimate Dependency Manager Shootout (QCon NY 2014)
 
Modular JavaScript
Modular JavaScriptModular JavaScript
Modular JavaScript
 
Modularity in the Cloud
Modularity in the CloudModularity in the Cloud
Modularity in the Cloud
 
Cross-Build Injection attacks: how safe is your Java build?
Cross-Build Injection attacks: how safe is your Java build?Cross-Build Injection attacks: how safe is your Java build?
Cross-Build Injection attacks: how safe is your Java build?
 
Scala & Lift (JEEConf 2012)
Scala & Lift (JEEConf 2012)Scala & Lift (JEEConf 2012)
Scala & Lift (JEEConf 2012)
 
Hibernate Performance Tuning (JEEConf 2012)
Hibernate Performance Tuning (JEEConf 2012)Hibernate Performance Tuning (JEEConf 2012)
Hibernate Performance Tuning (JEEConf 2012)
 
Akka (BeJUG)
Akka (BeJUG)Akka (BeJUG)
Akka (BeJUG)
 
Fork Join (BeJUG 2012)
Fork Join (BeJUG 2012)Fork Join (BeJUG 2012)
Fork Join (BeJUG 2012)
 
Fork/Join for Fun and Profit!
Fork/Join for Fun and Profit!Fork/Join for Fun and Profit!
Fork/Join for Fun and Profit!
 
Kscope11 recap
Kscope11 recapKscope11 recap
Kscope11 recap
 
Java 7: Fork/Join, Invokedynamic and the future
Java 7: Fork/Join, Invokedynamic and the futureJava 7: Fork/Join, Invokedynamic and the future
Java 7: Fork/Join, Invokedynamic and the future
 
Scala and Lift
Scala and LiftScala and Lift
Scala and Lift
 
Elevate your webapps with Scala and Lift
Elevate your webapps with Scala and LiftElevate your webapps with Scala and Lift
Elevate your webapps with Scala and Lift
 

Kürzlich hochgeladen

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Kürzlich hochgeladen (20)

Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 

Data Science with R for Java developers