SlideShare a Scribd company logo
1 of 12
Download to read offline
val ScAlH2O =
Scala ++ H2O
San Francisco Data Science
Why Scala & H2O ?
●

H 2 O ~ fa s t, d is trib u te d , la rg e s c a le c o m p
p la tfo rm p ro v id in g ric h J a v a A P I
–

B u t lo w -le v e l a n d fo r m a n y u s e r s t o o c o m p li c a t e d
public class ShuffleTask extends MRTask2<ShuffleTask> {
@Override public void map(Chunk ic, Chunk oc) {
if (ic._len==0) return;
// Each vector is shuffled in the same way
Random rng = Utils.getRNG(0xe031e74f321f7e29L + (ic.cidx() << 32L));
oc.set0(0,ic.at0(0));
for (int row=1; row<ic._len; row++) {
int j = rng.nextInt(row+1); // inclusive upper bound <0,row>
if (j!=row) oc.set0(row, oc.at0(j));
oc.set0(j, ic.at0(row));
}
}
}
What we provides
●

ScAlH2O - Scala library providing a DSL
–
–

Easy data manipulation and distributed computation

–

●

Abstracting of H2O low-level API
BUT still inside JVM

Scala REPL integration into H2O
–

Console for experimenting with ScAlH2O
Basic concepts
●

First-class entities
–

Scalars

–

Frames

●

Scala expressions

●

Access to H2O aglos
–

And still preserving access to low-level H2O API)
Frame operations
●

Parse data

●

Basic slicing
–

val f = parse("smalldata/cars.csv")

val f1 = f("name") ++ f(*, 5 to 7)

Column/Rows selectors, append

●

Scalar operations

●

Support head/tail/ncols/nrows/...

●

Cooperation with H2O distributed KV store
–

Load/save operations

val f2 = f1("year") + 1900

val g = load("cars.hex")
val g1 = g ++ g("year") > 80
save("cars.hex", g1)
Map/filter/collect operations
●

M ap
–

P e r v a lu e /r o w

●

F ilte r

●

C o lle c t

// Returns a boolean vector
val fm = f map ( new FAOp {
def apply(rhs: Array[scala.Double]):Boolean = rhs(2) > 4;
});

// Collect all cars with more than 4 cylinders
val ff = f filter ( new FAOp {
def apply(rhs: Array[scala.Double]):Boolean = rhs(2) > 4;
});

// Compute sum of 2. column
val fc = f collect ( 0.0, new CDOp() {
def apply(acc:scala.Double rhs:Array[scala.Double]) =
acc + rhs(2)
def reduce(l:scala.Double,r:scala.Double) = l+r
} )
Internals
It's magic
Internals
●

No magic, BUT there are key-tricks
–

connect H2O classloaders with Scala ecosystem
●

–

Make sure that all distrib. objects are correctly iced

make translation of Scala code into calls of Java API
●

●

Pass operations around t he cloud

●

–

Create H2O MR tasks
Create new frames

preserve primitives types
●

do not introduce overhead of boxing/unboxing
Internals – translation to H2O MR tasks
// Collect all cars with more than 4 cylinders
val f5 = f filter ( new FAOp {
def apply(rhs: Array[scala.Double]):Boolean = rhs(2) > 4;
});

T_A2B_Transf has to be water.Freezable
def filter(af: T_A2B_Transf[scala.Double]):T = {
val f = frame()
val mrt = new MRTask2() {
override def map(in:Array[Chunk], out:Array[NewChunk]) = {
val rlen = in(0)._len
val tmprow = new Array[scala.Double](in.length)
for (row:Int <- 0 until rlen ) {
if (af(Utils.readRow(in,row,tmprow))) {
for (i:Int <- 0 until in.length) out(i).addNum(tmprow(i))
}
}
}
}
mrt.doAll(f.numCols(), f)
val result = mrt.outputFrame(f.names(), f.domains())
apply(result) // return the DFrame
}
Party demo time!
Towards Scalding-like API
●

V is io n is to p ro v id e S c a ld in g -lik e s y n ta x
Input scheme

Output scheme

f map ( ('name, 'cylinders) -> ('name, 'moreThan4) )
{ (n:String, c:Int) => (n, if (c>4) 1 else 0) }

●

B u t s o fa r D S L is s till u g ly
f map (f, ('name, 'cylinders) -> ('name, 'moreThan4) )
{ new IcedFunctor2to2[Double,Int,Double,Int] {
def apply(n:Double, c:Int) = (n, if (c>4) 1 else 0) }
}

Transformation
Try and contribute !

> git clone git@github.com:0xdata/h2o.git
> git checkout -b h2oscala origin/h2oscala
> cd h2o-scala && ./depl.sh # or sbt compile
=== Welcome to the world of ScAlH2O ===
Type `help` or `example` to begin...
h2o>

Thank you!

More Related Content

What's hot

Ece512 h1 20139_621386735458ece512_test2_solutions
Ece512 h1 20139_621386735458ece512_test2_solutionsEce512 h1 20139_621386735458ece512_test2_solutions
Ece512 h1 20139_621386735458ece512_test2_solutions
nadia abd
 
Call report from x++
Call report from x++Call report from x++
Call report from x++
Ahmed Farag
 
13. quadratic formtemplatetouchpad
13. quadratic formtemplatetouchpad13. quadratic formtemplatetouchpad
13. quadratic formtemplatetouchpad
Media4math
 

What's hot (20)

StrataGEM: A Generic Petri Net Verification Framework
StrataGEM: A Generic Petri Net Verification FrameworkStrataGEM: A Generic Petri Net Verification Framework
StrataGEM: A Generic Petri Net Verification Framework
 
Pert management
Pert management Pert management
Pert management
 
Ece512 h1 20139_621386735458ece512_test2_solutions
Ece512 h1 20139_621386735458ece512_test2_solutionsEce512 h1 20139_621386735458ece512_test2_solutions
Ece512 h1 20139_621386735458ece512_test2_solutions
 
Matematica CS-GOTHIC
Matematica CS-GOTHICMatematica CS-GOTHIC
Matematica CS-GOTHIC
 
Virtual reality
Virtual realityVirtual reality
Virtual reality
 
Generating and Analyzing Events
Generating and Analyzing EventsGenerating and Analyzing Events
Generating and Analyzing Events
 
Rumus vb
Rumus vbRumus vb
Rumus vb
 
Coq for ML users
Coq for ML usersCoq for ML users
Coq for ML users
 
Arrays
ArraysArrays
Arrays
 
Lec20 dimension1
Lec20 dimension1Lec20 dimension1
Lec20 dimension1
 
Google V8 engine
Google V8 engineGoogle V8 engine
Google V8 engine
 
Algorithm analysis basics - Seven Functions/Big-Oh/Omega/Theta
Algorithm analysis basics - Seven Functions/Big-Oh/Omega/ThetaAlgorithm analysis basics - Seven Functions/Big-Oh/Omega/Theta
Algorithm analysis basics - Seven Functions/Big-Oh/Omega/Theta
 
Push down automata
Push down automataPush down automata
Push down automata
 
Distributed DBMS - Unit - 4 - Data Distribution Alternatives
Distributed DBMS - Unit - 4 - Data Distribution Alternatives Distributed DBMS - Unit - 4 - Data Distribution Alternatives
Distributed DBMS - Unit - 4 - Data Distribution Alternatives
 
Call report from x++
Call report from x++Call report from x++
Call report from x++
 
13. quadratic formtemplatetouchpad
13. quadratic formtemplatetouchpad13. quadratic formtemplatetouchpad
13. quadratic formtemplatetouchpad
 
Contrastive Divergence Learning
Contrastive Divergence LearningContrastive Divergence Learning
Contrastive Divergence Learning
 
OpenTuesday: Neues aus der RRDtool Welt
OpenTuesday: Neues aus der RRDtool WeltOpenTuesday: Neues aus der RRDtool Welt
OpenTuesday: Neues aus der RRDtool Welt
 
Faster Python, FOSDEM
Faster Python, FOSDEMFaster Python, FOSDEM
Faster Python, FOSDEM
 
Connected hubs: an analysis of the Lufthansa network in Europe
Connected hubs: an analysis of the Lufthansa network in EuropeConnected hubs: an analysis of the Lufthansa network in Europe
Connected hubs: an analysis of the Lufthansa network in Europe
 

Similar to Michal Malohlava presents: Open Source H2O and Scala

Spark by Adform Research, Paulius
Spark by Adform Research, PauliusSpark by Adform Research, Paulius
Spark by Adform Research, Paulius
Vasil Remeniuk
 
Refactoring to Macros with Clojure
Refactoring to Macros with ClojureRefactoring to Macros with Clojure
Refactoring to Macros with Clojure
Dmitry Buzdin
 
All I know about rsc.io/c2go
All I know about rsc.io/c2goAll I know about rsc.io/c2go
All I know about rsc.io/c2go
Moriyoshi Koizumi
 

Similar to Michal Malohlava presents: Open Source H2O and Scala (20)

Spark workshop
Spark workshopSpark workshop
Spark workshop
 
Introduction to Functional Programming with Haskell and JavaScript
Introduction to Functional Programming with Haskell and JavaScriptIntroduction to Functional Programming with Haskell and JavaScript
Introduction to Functional Programming with Haskell and JavaScript
 
talk at Virginia Bioinformatics Institute, December 5, 2013
talk at Virginia Bioinformatics Institute, December 5, 2013talk at Virginia Bioinformatics Institute, December 5, 2013
talk at Virginia Bioinformatics Institute, December 5, 2013
 
Full Stack Clojure
Full Stack ClojureFull Stack Clojure
Full Stack Clojure
 
[FT-11][suhorng] “Poor Man's” Undergraduate Compilers
[FT-11][suhorng] “Poor Man's” Undergraduate Compilers[FT-11][suhorng] “Poor Man's” Undergraduate Compilers
[FT-11][suhorng] “Poor Man's” Undergraduate Compilers
 
Being functional in PHP (PHPDay Italy 2016)
Being functional in PHP (PHPDay Italy 2016)Being functional in PHP (PHPDay Italy 2016)
Being functional in PHP (PHPDay Italy 2016)
 
A Multidimensional Distributed Array Abstraction for PGAS (HPCC'16)
A Multidimensional Distributed Array Abstraction for PGAS (HPCC'16)A Multidimensional Distributed Array Abstraction for PGAS (HPCC'16)
A Multidimensional Distributed Array Abstraction for PGAS (HPCC'16)
 
Go Says WAT?
Go Says WAT?Go Says WAT?
Go Says WAT?
 
Spark by Adform Research, Paulius
Spark by Adform Research, PauliusSpark by Adform Research, Paulius
Spark by Adform Research, Paulius
 
User Defined Aggregation in Apache Spark: A Love Story
User Defined Aggregation in Apache Spark: A Love StoryUser Defined Aggregation in Apache Spark: A Love Story
User Defined Aggregation in Apache Spark: A Love Story
 
User Defined Aggregation in Apache Spark: A Love Story
User Defined Aggregation in Apache Spark: A Love StoryUser Defined Aggregation in Apache Spark: A Love Story
User Defined Aggregation in Apache Spark: A Love Story
 
Refactoring to Macros with Clojure
Refactoring to Macros with ClojureRefactoring to Macros with Clojure
Refactoring to Macros with Clojure
 
C++11
C++11C++11
C++11
 
All I know about rsc.io/c2go
All I know about rsc.io/c2goAll I know about rsc.io/c2go
All I know about rsc.io/c2go
 
Seminar PSU 10.10.2014 mme
Seminar PSU 10.10.2014 mmeSeminar PSU 10.10.2014 mme
Seminar PSU 10.10.2014 mme
 
When RV Meets CEP (RV 2016 Tutorial)
When RV Meets CEP (RV 2016 Tutorial)When RV Meets CEP (RV 2016 Tutorial)
When RV Meets CEP (RV 2016 Tutorial)
 
Scala by Luc Duponcheel
Scala by Luc DuponcheelScala by Luc Duponcheel
Scala by Luc Duponcheel
 
Spark_Documentation_Template1
Spark_Documentation_Template1Spark_Documentation_Template1
Spark_Documentation_Template1
 
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Iterative Spark Developmen...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Iterative Spark Developmen...Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Iterative Spark Developmen...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Iterative Spark Developmen...
 
R meets Hadoop
R meets HadoopR meets Hadoop
R meets Hadoop
 

More from Sri Ambati

More from Sri Ambati (20)

H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
Generative AI Masterclass - Model Risk Management.pptx
Generative AI Masterclass - Model Risk Management.pptxGenerative AI Masterclass - Model Risk Management.pptx
Generative AI Masterclass - Model Risk Management.pptx
 
AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek
 
LLMOps: Match report from the top of the 5th
LLMOps: Match report from the top of the 5thLLMOps: Match report from the top of the 5th
LLMOps: Match report from the top of the 5th
 
Building, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for ProductionBuilding, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for Production
 
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
 
Risk Management for LLMs
Risk Management for LLMsRisk Management for LLMs
Risk Management for LLMs
 
Open-Source AI: Community is the Way
Open-Source AI: Community is the WayOpen-Source AI: Community is the Way
Open-Source AI: Community is the Way
 
Building Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2OBuilding Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2O
 
Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical
 
Cutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM PapersCutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM Papers
 
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
 
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
 
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
 
LLM Interpretability
LLM Interpretability LLM Interpretability
LLM Interpretability
 
Never Reply to an Email Again
Never Reply to an Email AgainNever Reply to an Email Again
Never Reply to an Email Again
 
Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)
 
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
 
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
 
AI Foundations Course Module 1 - An AI Transformation Journey
AI Foundations Course Module 1 - An AI Transformation JourneyAI Foundations Course Module 1 - An AI Transformation Journey
AI Foundations Course Module 1 - An AI Transformation Journey
 

Recently uploaded

Breaking Down the Flutterwave Scandal What You Need to Know.pdf
Breaking Down the Flutterwave Scandal What You Need to Know.pdfBreaking Down the Flutterwave Scandal What You Need to Know.pdf
Breaking Down the Flutterwave Scandal What You Need to Know.pdf
UK Journal
 

Recently uploaded (20)

Breaking Down the Flutterwave Scandal What You Need to Know.pdf
Breaking Down the Flutterwave Scandal What You Need to Know.pdfBreaking Down the Flutterwave Scandal What You Need to Know.pdf
Breaking Down the Flutterwave Scandal What You Need to Know.pdf
 
Microsoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - QuestionnaireMicrosoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - Questionnaire
 
A Business-Centric Approach to Design System Strategy
A Business-Centric Approach to Design System StrategyA Business-Centric Approach to Design System Strategy
A Business-Centric Approach to Design System Strategy
 
Demystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyDemystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John Staveley
 
AI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří KarpíšekAI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří Karpíšek
 
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdfLinux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
 
Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024
 
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdfSimplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
 
Powerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara LaskowskaPowerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara Laskowska
 
How we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdfHow we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdf
 
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
 
WSO2CONMay2024OpenSourceConferenceDebrief.pptx
WSO2CONMay2024OpenSourceConferenceDebrief.pptxWSO2CONMay2024OpenSourceConferenceDebrief.pptx
WSO2CONMay2024OpenSourceConferenceDebrief.pptx
 
Portal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russePortal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russe
 
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdfWhere to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
 
The Metaverse: Are We There Yet?
The  Metaverse:    Are   We  There  Yet?The  Metaverse:    Are   We  There  Yet?
The Metaverse: Are We There Yet?
 
Designing for Hardware Accessibility at Comcast
Designing for Hardware Accessibility at ComcastDesigning for Hardware Accessibility at Comcast
Designing for Hardware Accessibility at Comcast
 
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdfThe Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
 
WebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM PerformanceWebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM Performance
 
Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi IbrahimzadeFree and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
 
Optimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through ObservabilityOptimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through Observability
 

Michal Malohlava presents: Open Source H2O and Scala

  • 1. val ScAlH2O = Scala ++ H2O San Francisco Data Science
  • 2. Why Scala & H2O ? ● H 2 O ~ fa s t, d is trib u te d , la rg e s c a le c o m p p la tfo rm p ro v id in g ric h J a v a A P I – B u t lo w -le v e l a n d fo r m a n y u s e r s t o o c o m p li c a t e d public class ShuffleTask extends MRTask2<ShuffleTask> { @Override public void map(Chunk ic, Chunk oc) { if (ic._len==0) return; // Each vector is shuffled in the same way Random rng = Utils.getRNG(0xe031e74f321f7e29L + (ic.cidx() << 32L)); oc.set0(0,ic.at0(0)); for (int row=1; row<ic._len; row++) { int j = rng.nextInt(row+1); // inclusive upper bound <0,row> if (j!=row) oc.set0(row, oc.at0(j)); oc.set0(j, ic.at0(row)); } } }
  • 3. What we provides ● ScAlH2O - Scala library providing a DSL – – Easy data manipulation and distributed computation – ● Abstracting of H2O low-level API BUT still inside JVM Scala REPL integration into H2O – Console for experimenting with ScAlH2O
  • 4. Basic concepts ● First-class entities – Scalars – Frames ● Scala expressions ● Access to H2O aglos – And still preserving access to low-level H2O API)
  • 5. Frame operations ● Parse data ● Basic slicing – val f = parse("smalldata/cars.csv") val f1 = f("name") ++ f(*, 5 to 7) Column/Rows selectors, append ● Scalar operations ● Support head/tail/ncols/nrows/... ● Cooperation with H2O distributed KV store – Load/save operations val f2 = f1("year") + 1900 val g = load("cars.hex") val g1 = g ++ g("year") > 80 save("cars.hex", g1)
  • 6. Map/filter/collect operations ● M ap – P e r v a lu e /r o w ● F ilte r ● C o lle c t // Returns a boolean vector val fm = f map ( new FAOp { def apply(rhs: Array[scala.Double]):Boolean = rhs(2) > 4; }); // Collect all cars with more than 4 cylinders val ff = f filter ( new FAOp { def apply(rhs: Array[scala.Double]):Boolean = rhs(2) > 4; }); // Compute sum of 2. column val fc = f collect ( 0.0, new CDOp() { def apply(acc:scala.Double rhs:Array[scala.Double]) = acc + rhs(2) def reduce(l:scala.Double,r:scala.Double) = l+r } )
  • 8. Internals ● No magic, BUT there are key-tricks – connect H2O classloaders with Scala ecosystem ● – Make sure that all distrib. objects are correctly iced make translation of Scala code into calls of Java API ● ● Pass operations around t he cloud ● – Create H2O MR tasks Create new frames preserve primitives types ● do not introduce overhead of boxing/unboxing
  • 9. Internals – translation to H2O MR tasks // Collect all cars with more than 4 cylinders val f5 = f filter ( new FAOp { def apply(rhs: Array[scala.Double]):Boolean = rhs(2) > 4; }); T_A2B_Transf has to be water.Freezable def filter(af: T_A2B_Transf[scala.Double]):T = { val f = frame() val mrt = new MRTask2() { override def map(in:Array[Chunk], out:Array[NewChunk]) = { val rlen = in(0)._len val tmprow = new Array[scala.Double](in.length) for (row:Int <- 0 until rlen ) { if (af(Utils.readRow(in,row,tmprow))) { for (i:Int <- 0 until in.length) out(i).addNum(tmprow(i)) } } } } mrt.doAll(f.numCols(), f) val result = mrt.outputFrame(f.names(), f.domains()) apply(result) // return the DFrame }
  • 11. Towards Scalding-like API ● V is io n is to p ro v id e S c a ld in g -lik e s y n ta x Input scheme Output scheme f map ( ('name, 'cylinders) -> ('name, 'moreThan4) ) { (n:String, c:Int) => (n, if (c>4) 1 else 0) } ● B u t s o fa r D S L is s till u g ly f map (f, ('name, 'cylinders) -> ('name, 'moreThan4) ) { new IcedFunctor2to2[Double,Int,Double,Int] { def apply(n:Double, c:Int) = (n, if (c>4) 1 else 0) } } Transformation
  • 12. Try and contribute ! > git clone git@github.com:0xdata/h2o.git > git checkout -b h2oscala origin/h2oscala > cd h2o-scala && ./depl.sh # or sbt compile === Welcome to the world of ScAlH2O === Type `help` or `example` to begin... h2o> Thank you!