SlideShare ist ein Scribd-Unternehmen logo
1 von 24
Leveraging scriptable
infrastructures,
Towards a paradigm shift in
software for data science
Cloud Era Ltd 14 June 2013 karim.chine@cloudera.co.uk
Karim Chine
InfoQ.com: News & Community Site
• 750,000 unique visitors/month
• Published in 4 languages (English, Chinese, Japanese and Brazilian
Portuguese)
• Post content from our QCon conferences
• News 15-20 / week
• Articles 3-4 / week
• Presentations (videos) 12-15 / week
• Interviews 2-3 / week
• Books 1 / month
Watch the video with slide
synchronization on InfoQ.com!
http://www.infoq.com/presentations
/elastic-r-data
Presented at QCon New York
www.qconnewyork.com
Purpose of QCon
- to empower software development by facilitating the spread of
knowledge and innovation
Strategy
- practitioner-driven conference designed for YOU: influencers of
change and innovation in your teams
- speakers and topics driving the evolution and innovation
- connecting and catalyzing the influencers and innovators
Highlights
- attended by more than 12,000 delegates since 2007
- held in 9 cities worldwide
22
Outline
 Introduction
 Fragmentation and friction in the Data Science arena
 The understated potential of Cloud scriptability
 Elastic-R: Towards a universal platform for Data Science
 Elastic-R: Design and Technologies Overview
 Elastic-R: The scriptability toolset
 Demo
 Conclusion
33
Introduction
“The next information revolution is well underway. But it is
not happening where information scientists, information
executives, and the information industry in general are
looking for it. It is not a revolution in technology, machinery,
techniques, software, or speed. It is a revolution in
concepts.”
Peter Drucker. Forbes ASAP, August 24, 1998
44
Introduction
 Science and the 4th Paradigm
 The e-Research Dancing Bears
2
2
2.
3
4
a
cG
a
a










Experimental Science Theoretical Science Computational Science e-Science / Data-intensive Science
The townspeople gather to
see the wondrous sight as the
massive, lumbering beast
shambles and shuffles from
paw to paw. The bear is really
a terrible dancer, and the
wonder isn't that the bear
dances well but that the bear
dances at all.
5
Fragmentation and friction in the
Data Science Arena
www.scilab.org
http://root.cern.ch
www.sagemath.org
www.sas.com
office.microsoft.com
www.mathworks.com
www.scipy.org
www.spss.com
www.wolfram.com
www.python.org
 Open-source (GPL) software
environment for statistical computing
and graphics
 Lingua franca of data analysis.
 Repositories of contributed R
packages related to a variety of
problem domains in life sciences,
social sciences, finance, econometrics,
chemo metrics, etc. are growing at an
exponential rate.
 R is Super Glue
6
The understated potential of
Cloud scriptability
 3D printers are becoming
a common place:
 Creating three-dimensional
solid object of virtually any
shape from a digital model.
 Scripting the physical world
 Sharing physical reality on
Facebook is now as easy as
sharing your holiday pictures
7
Elastic-R: Towards a universal
platform for Data Science
Computational Components
R packages, Wrapped C,C++,Fortran code, Python modules, Matlab Toolkits…
Open source or commercial
Computational Resources
Clusters, grids, private or public clouds
Free or pay-per-use
Computational GUIs
HTML5 and Desktop Workbench
Built-in views /Plugins /Collaborative views
Open source or commercial
Computational Scripts
R / Python / Matlab / Groovy
Computational APIs
Java / SOAP / REST, Stateless and stateful
Computational Storage
Local, NFS, FTP, Amazon S3, EBS
Generated Computational Web Services
Stateful or stateless, mapping of R objects/functions
Elastic
-R
8
Elastic-R: Towards a universal
platform for Data Science
9
Elastic-R: Towards a universal
platform for Data Science
Public
Clouds
Private Cloud
10
Elastic-R: Design and
Technologies Overview
You are here
Your data
is here
11
Elastic-R: Design and
Technologies Overview
 Remote Java/R
Processes
 Events-driven Remote
Objects/Engines
 R, Python, Mathematica,
Matlab, Scilab, ...
 Collaborative Spreadsheets
 Collaborative Scientific
Graphics Canvas
 Collaborative Dashboard with
collaborative widgets
12
Elastic-R: Design and
Technologies Overview
Elastic-R
AMI 1
R 2.10
BioC 2.5
Elastic-R
AMI 2
R 2.9
BioC 2.3
Elastic-R
AMI 3
R 2.8
BioC 2.0
Elastic-R Amazon Machine Images
Elastic-R
EBS 1
Data Set XXX
Elastic-R
EBS 2
Data Set
YYY
Elastic-R
EBS 3
Data Set
ZZZ
Elastic-R
EBS 4
Data Set VVV
Elastic-R
AMI 2
R 2.9
BioC 2.3
Elastic-R
EBS 4
Data Set
VVV
Amazon Elastic Block Stores
Eastic-R
AMI 2
R 2.9
BioC 2.3
Elastic-R.org
Elastic-R
EBS 4
Data Set
VVV
1313
Modes of access
 Individuals with AWS accounts
 Standard AMIs (Amazon Machine Images): paid-per-use to Amazon.
For Academic use and trial purposes
 Paid AMI: Paid-per-use software model. For business users
 Individuals without AWS accounts
 Trial tokens, purchased tokens, tokens granted by other users.
 Resources (data science engines) shared by other users
 Individual subscribtions
 Companies/Educational & Research Institutions
 Dedicated Platform and AMIs on an Amazon VPC (Virtual Private
Cloud). Paid via subscribtion
1414
Elastic-R: The scriptability
toolset
Command Line
Web Console
SDK
API
1515
Elastic-R: The scriptability
toolset
Command Line
Web Console
SDK
API
1616
Elastic-R: The scriptability
toolset
WS generator
rws.war
+ mapping.jar
+ pooling framework
+ R Java Bridge
+ JAX-WS
- Servlets
- Generated artifacts
Eclipse Web Service Client Generator
public static void main(String[] args) throws Exception {
RGlobalEnvFunctionWeb g=new
RGlobalEnvFunctionWebServiceLocator().getrGlobalEnvFunctionWebPort();
RNumeric x=new RNumeric(); x.setValue(new Double[]{6.0});
System.out.println(g.square(x).getValue()[0]);
}
Web Services Generation
square  function(x) {return(x^2) }
typeInfo(square)  SimultaneousTypeSpecification(
TypedSignature(x = "numeric"), returnType =
"numeric")
Script / globals.r
Script / rjmap.xml
<rj>
<publish>
<functions> <function name="square" forWeb="true"/> </functions>
</publish>
<scripts> <initScript name="globals.r" embed="true"/> </scripts>
</rj>
Deploy
1717
Elastic-R: The scriptability
toolset
 API: Soap and Restful Web Services
 Data Analysis Engines control
 Data Analysis Engines Management (Life cycle, etc.)
 Virtual appliances/artifacts management
 Platform administartion
 SDKs: Java, R, Microsoft Office (Vba)
 Command line: elasticR package
 Html5 Workbench
 Elastic-R artifacts management interface
 Engines control interface
1818
Demo
 Register to Elastic-R academic and trial portal
(www.elastic-r.org)
 Create Data Science Engines using trial tokens
 Work with R, Python and Scientific Spreadsheets in the
browser
 Share Data Science Engine and Collaborate
 Connect to the remote Data Science Engine from within
a local R session, push and pull data, execute
commands and show impact on spreadsheets and
dashboards
1919
Conclusion
 Elastic -R unlocks the potential of the cloud for Data
Scientists and analytical applications developers and
improves dramatically their productivity
 The Data Science IT Factory chain, from resources
acquisition to services and applications design and
publication becomes:
 Fully scriptable in R
 under the direct control of the Data Scientist and the developer
 Fully traceable and reproducible
 Elastic-R can seamlessly extend any existing application
or service and can be used as a collaboration backend
 Openness, security and reliability are among the key
capabilities delivered
2020
What to do Next
 Register to Elastic-R and try the HTML 5 Workbench and
the collaboration capabilities
 Download the R package elasticR and use it to access
the cloud from local R sessions
 Download the Java SDK and create your first analytical
application using AWS and advanced tools for
programming with data.
 Get in touch with me to explore potential partnerships
and collaborations
2121
Contact details
 Karim Chine
 karim.chine@cloudera.co.uk
Watch the video with slide synchronization on
InfoQ.com!
http://www.infoq.com/presentations/elastic-r-
data

Weitere ähnliche Inhalte

Andere mochten auch

Design Tools for Systems Thinking
Design Tools for Systems ThinkingDesign Tools for Systems Thinking
Design Tools for Systems Thinking
Peter Vermaercke
 
MENTAL MODELS: Lessons From The Fifth Discipline Fieldbook by Senge, Kleiker...
MENTAL MODELS:  Lessons From The Fifth Discipline Fieldbook by Senge, Kleiker...MENTAL MODELS:  Lessons From The Fifth Discipline Fieldbook by Senge, Kleiker...
MENTAL MODELS: Lessons From The Fifth Discipline Fieldbook by Senge, Kleiker...
Amy Rae
 
Cola wars case presentation
Cola wars case presentationCola wars case presentation
Cola wars case presentation
jkwong5
 

Andere mochten auch (19)

Case Studies in Ethics Teaching
Case Studies in Ethics TeachingCase Studies in Ethics Teaching
Case Studies in Ethics Teaching
 
The Archetypal Lens
The Archetypal LensThe Archetypal Lens
The Archetypal Lens
 
Reward system by thinker
Reward system by thinkerReward system by thinker
Reward system by thinker
 
Archetypes
ArchetypesArchetypes
Archetypes
 
Module 4 Leverage points and systemic interventions
Module 4 Leverage points and systemic interventionsModule 4 Leverage points and systemic interventions
Module 4 Leverage points and systemic interventions
 
Module 3 Systems archetypes
Module 3 Systems archetypesModule 3 Systems archetypes
Module 3 Systems archetypes
 
Systems Thinking: A Foxy Approach
Systems Thinking: A Foxy ApproachSystems Thinking: A Foxy Approach
Systems Thinking: A Foxy Approach
 
Systems Thinking & The Building Blocks of Health Systems
Systems Thinking & The Building Blocks of Health SystemsSystems Thinking & The Building Blocks of Health Systems
Systems Thinking & The Building Blocks of Health Systems
 
A Survey of HBase Application Archetypes
A Survey of HBase Application ArchetypesA Survey of HBase Application Archetypes
A Survey of HBase Application Archetypes
 
8 Archetypes of Leadership
8 Archetypes of Leadership 8 Archetypes of Leadership
8 Archetypes of Leadership
 
Module 2 Causal loop modelling
Module 2 Causal loop modellingModule 2 Causal loop modelling
Module 2 Causal loop modelling
 
Design Tools for Systems Thinking
Design Tools for Systems ThinkingDesign Tools for Systems Thinking
Design Tools for Systems Thinking
 
MENTAL MODELS: Lessons From The Fifth Discipline Fieldbook by Senge, Kleiker...
MENTAL MODELS:  Lessons From The Fifth Discipline Fieldbook by Senge, Kleiker...MENTAL MODELS:  Lessons From The Fifth Discipline Fieldbook by Senge, Kleiker...
MENTAL MODELS: Lessons From The Fifth Discipline Fieldbook by Senge, Kleiker...
 
Business Model Archetypes
Business Model ArchetypesBusiness Model Archetypes
Business Model Archetypes
 
Module 1 Introduction to systems thinking
Module 1 Introduction to systems thinkingModule 1 Introduction to systems thinking
Module 1 Introduction to systems thinking
 
Systems thinking in a nutshell
Systems thinking in a nutshellSystems thinking in a nutshell
Systems thinking in a nutshell
 
The Power Of Archetypes In Brand Creation
The Power Of Archetypes In Brand CreationThe Power Of Archetypes In Brand Creation
The Power Of Archetypes In Brand Creation
 
Cola wars case presentation
Cola wars case presentationCola wars case presentation
Cola wars case presentation
 
Intro to Systems Thinking
Intro to Systems ThinkingIntro to Systems Thinking
Intro to Systems Thinking
 

Mehr von C4Media

Mehr von C4Media (20)

Streaming a Million Likes/Second: Real-Time Interactions on Live Video
Streaming a Million Likes/Second: Real-Time Interactions on Live VideoStreaming a Million Likes/Second: Real-Time Interactions on Live Video
Streaming a Million Likes/Second: Real-Time Interactions on Live Video
 
Next Generation Client APIs in Envoy Mobile
Next Generation Client APIs in Envoy MobileNext Generation Client APIs in Envoy Mobile
Next Generation Client APIs in Envoy Mobile
 
Software Teams and Teamwork Trends Report Q1 2020
Software Teams and Teamwork Trends Report Q1 2020Software Teams and Teamwork Trends Report Q1 2020
Software Teams and Teamwork Trends Report Q1 2020
 
Understand the Trade-offs Using Compilers for Java Applications
Understand the Trade-offs Using Compilers for Java ApplicationsUnderstand the Trade-offs Using Compilers for Java Applications
Understand the Trade-offs Using Compilers for Java Applications
 
Kafka Needs No Keeper
Kafka Needs No KeeperKafka Needs No Keeper
Kafka Needs No Keeper
 
High Performing Teams Act Like Owners
High Performing Teams Act Like OwnersHigh Performing Teams Act Like Owners
High Performing Teams Act Like Owners
 
Does Java Need Inline Types? What Project Valhalla Can Bring to Java
Does Java Need Inline Types? What Project Valhalla Can Bring to JavaDoes Java Need Inline Types? What Project Valhalla Can Bring to Java
Does Java Need Inline Types? What Project Valhalla Can Bring to Java
 
Service Meshes- The Ultimate Guide
Service Meshes- The Ultimate GuideService Meshes- The Ultimate Guide
Service Meshes- The Ultimate Guide
 
Shifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CDShifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CD
 
CI/CD for Machine Learning
CI/CD for Machine LearningCI/CD for Machine Learning
CI/CD for Machine Learning
 
Fault Tolerance at Speed
Fault Tolerance at SpeedFault Tolerance at Speed
Fault Tolerance at Speed
 
Architectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep SystemsArchitectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep Systems
 
ML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.jsML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.js
 
Build Your Own WebAssembly Compiler
Build Your Own WebAssembly CompilerBuild Your Own WebAssembly Compiler
Build Your Own WebAssembly Compiler
 
User & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix ScaleUser & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix Scale
 
Scaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's EdgeScaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's Edge
 
Make Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home EverywhereMake Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home Everywhere
 
The Talk You've Been Await-ing For
The Talk You've Been Await-ing ForThe Talk You've Been Await-ing For
The Talk You've Been Await-ing For
 
Future of Data Engineering
Future of Data EngineeringFuture of Data Engineering
Future of Data Engineering
 
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and MoreAutomated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
 

Kürzlich hochgeladen

Structuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessStructuring Teams and Portfolios for Success
Structuring Teams and Portfolios for Success
UXDXConf
 

Kürzlich hochgeladen (20)

AI presentation and introduction - Retrieval Augmented Generation RAG 101
AI presentation and introduction - Retrieval Augmented Generation RAG 101AI presentation and introduction - Retrieval Augmented Generation RAG 101
AI presentation and introduction - Retrieval Augmented Generation RAG 101
 
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
 
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
 
Intro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджераIntro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджера
 
IESVE for Early Stage Design and Planning
IESVE for Early Stage Design and PlanningIESVE for Early Stage Design and Planning
IESVE for Early Stage Design and Planning
 
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptxUnpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
 
Optimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through ObservabilityOptimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through Observability
 
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
 
WebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM PerformanceWebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM Performance
 
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone KomSalesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
 
Google I/O Extended 2024 Warsaw
Google I/O Extended 2024 WarsawGoogle I/O Extended 2024 Warsaw
Google I/O Extended 2024 Warsaw
 
The UX of Automation by AJ King, Senior UX Researcher, Ocado
The UX of Automation by AJ King, Senior UX Researcher, OcadoThe UX of Automation by AJ King, Senior UX Researcher, Ocado
The UX of Automation by AJ King, Senior UX Researcher, Ocado
 
Powerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara LaskowskaPowerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara Laskowska
 
IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024
 
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdfThe Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
 
ECS 2024 Teams Premium - Pretty Secure
ECS 2024   Teams Premium - Pretty SecureECS 2024   Teams Premium - Pretty Secure
ECS 2024 Teams Premium - Pretty Secure
 
Connecting the Dots in Product Design at KAYAK
Connecting the Dots in Product Design at KAYAKConnecting the Dots in Product Design at KAYAK
Connecting the Dots in Product Design at KAYAK
 
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
 
Oauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoftOauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoft
 
Structuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessStructuring Teams and Portfolios for Success
Structuring Teams and Portfolios for Success
 

Leveraging Scriptable Infrastructures, Towards a Paradigm Shift in Software for Data Science

  • 1. Leveraging scriptable infrastructures, Towards a paradigm shift in software for data science Cloud Era Ltd 14 June 2013 karim.chine@cloudera.co.uk Karim Chine
  • 2. InfoQ.com: News & Community Site • 750,000 unique visitors/month • Published in 4 languages (English, Chinese, Japanese and Brazilian Portuguese) • Post content from our QCon conferences • News 15-20 / week • Articles 3-4 / week • Presentations (videos) 12-15 / week • Interviews 2-3 / week • Books 1 / month Watch the video with slide synchronization on InfoQ.com! http://www.infoq.com/presentations /elastic-r-data
  • 3. Presented at QCon New York www.qconnewyork.com Purpose of QCon - to empower software development by facilitating the spread of knowledge and innovation Strategy - practitioner-driven conference designed for YOU: influencers of change and innovation in your teams - speakers and topics driving the evolution and innovation - connecting and catalyzing the influencers and innovators Highlights - attended by more than 12,000 delegates since 2007 - held in 9 cities worldwide
  • 4. 22 Outline  Introduction  Fragmentation and friction in the Data Science arena  The understated potential of Cloud scriptability  Elastic-R: Towards a universal platform for Data Science  Elastic-R: Design and Technologies Overview  Elastic-R: The scriptability toolset  Demo  Conclusion
  • 5. 33 Introduction “The next information revolution is well underway. But it is not happening where information scientists, information executives, and the information industry in general are looking for it. It is not a revolution in technology, machinery, techniques, software, or speed. It is a revolution in concepts.” Peter Drucker. Forbes ASAP, August 24, 1998
  • 6. 44 Introduction  Science and the 4th Paradigm  The e-Research Dancing Bears 2 2 2. 3 4 a cG a a           Experimental Science Theoretical Science Computational Science e-Science / Data-intensive Science The townspeople gather to see the wondrous sight as the massive, lumbering beast shambles and shuffles from paw to paw. The bear is really a terrible dancer, and the wonder isn't that the bear dances well but that the bear dances at all.
  • 7. 5 Fragmentation and friction in the Data Science Arena www.scilab.org http://root.cern.ch www.sagemath.org www.sas.com office.microsoft.com www.mathworks.com www.scipy.org www.spss.com www.wolfram.com www.python.org  Open-source (GPL) software environment for statistical computing and graphics  Lingua franca of data analysis.  Repositories of contributed R packages related to a variety of problem domains in life sciences, social sciences, finance, econometrics, chemo metrics, etc. are growing at an exponential rate.  R is Super Glue
  • 8. 6 The understated potential of Cloud scriptability  3D printers are becoming a common place:  Creating three-dimensional solid object of virtually any shape from a digital model.  Scripting the physical world  Sharing physical reality on Facebook is now as easy as sharing your holiday pictures
  • 9. 7 Elastic-R: Towards a universal platform for Data Science Computational Components R packages, Wrapped C,C++,Fortran code, Python modules, Matlab Toolkits… Open source or commercial Computational Resources Clusters, grids, private or public clouds Free or pay-per-use Computational GUIs HTML5 and Desktop Workbench Built-in views /Plugins /Collaborative views Open source or commercial Computational Scripts R / Python / Matlab / Groovy Computational APIs Java / SOAP / REST, Stateless and stateful Computational Storage Local, NFS, FTP, Amazon S3, EBS Generated Computational Web Services Stateful or stateless, mapping of R objects/functions Elastic -R
  • 10. 8 Elastic-R: Towards a universal platform for Data Science
  • 11. 9 Elastic-R: Towards a universal platform for Data Science Public Clouds Private Cloud
  • 12. 10 Elastic-R: Design and Technologies Overview You are here Your data is here
  • 13. 11 Elastic-R: Design and Technologies Overview  Remote Java/R Processes  Events-driven Remote Objects/Engines  R, Python, Mathematica, Matlab, Scilab, ...  Collaborative Spreadsheets  Collaborative Scientific Graphics Canvas  Collaborative Dashboard with collaborative widgets
  • 14. 12 Elastic-R: Design and Technologies Overview Elastic-R AMI 1 R 2.10 BioC 2.5 Elastic-R AMI 2 R 2.9 BioC 2.3 Elastic-R AMI 3 R 2.8 BioC 2.0 Elastic-R Amazon Machine Images Elastic-R EBS 1 Data Set XXX Elastic-R EBS 2 Data Set YYY Elastic-R EBS 3 Data Set ZZZ Elastic-R EBS 4 Data Set VVV Elastic-R AMI 2 R 2.9 BioC 2.3 Elastic-R EBS 4 Data Set VVV Amazon Elastic Block Stores Eastic-R AMI 2 R 2.9 BioC 2.3 Elastic-R.org Elastic-R EBS 4 Data Set VVV
  • 15. 1313 Modes of access  Individuals with AWS accounts  Standard AMIs (Amazon Machine Images): paid-per-use to Amazon. For Academic use and trial purposes  Paid AMI: Paid-per-use software model. For business users  Individuals without AWS accounts  Trial tokens, purchased tokens, tokens granted by other users.  Resources (data science engines) shared by other users  Individual subscribtions  Companies/Educational & Research Institutions  Dedicated Platform and AMIs on an Amazon VPC (Virtual Private Cloud). Paid via subscribtion
  • 18. 1616 Elastic-R: The scriptability toolset WS generator rws.war + mapping.jar + pooling framework + R Java Bridge + JAX-WS - Servlets - Generated artifacts Eclipse Web Service Client Generator public static void main(String[] args) throws Exception { RGlobalEnvFunctionWeb g=new RGlobalEnvFunctionWebServiceLocator().getrGlobalEnvFunctionWebPort(); RNumeric x=new RNumeric(); x.setValue(new Double[]{6.0}); System.out.println(g.square(x).getValue()[0]); } Web Services Generation square  function(x) {return(x^2) } typeInfo(square)  SimultaneousTypeSpecification( TypedSignature(x = "numeric"), returnType = "numeric") Script / globals.r Script / rjmap.xml <rj> <publish> <functions> <function name="square" forWeb="true"/> </functions> </publish> <scripts> <initScript name="globals.r" embed="true"/> </scripts> </rj> Deploy
  • 19. 1717 Elastic-R: The scriptability toolset  API: Soap and Restful Web Services  Data Analysis Engines control  Data Analysis Engines Management (Life cycle, etc.)  Virtual appliances/artifacts management  Platform administartion  SDKs: Java, R, Microsoft Office (Vba)  Command line: elasticR package  Html5 Workbench  Elastic-R artifacts management interface  Engines control interface
  • 20. 1818 Demo  Register to Elastic-R academic and trial portal (www.elastic-r.org)  Create Data Science Engines using trial tokens  Work with R, Python and Scientific Spreadsheets in the browser  Share Data Science Engine and Collaborate  Connect to the remote Data Science Engine from within a local R session, push and pull data, execute commands and show impact on spreadsheets and dashboards
  • 21. 1919 Conclusion  Elastic -R unlocks the potential of the cloud for Data Scientists and analytical applications developers and improves dramatically their productivity  The Data Science IT Factory chain, from resources acquisition to services and applications design and publication becomes:  Fully scriptable in R  under the direct control of the Data Scientist and the developer  Fully traceable and reproducible  Elastic-R can seamlessly extend any existing application or service and can be used as a collaboration backend  Openness, security and reliability are among the key capabilities delivered
  • 22. 2020 What to do Next  Register to Elastic-R and try the HTML 5 Workbench and the collaboration capabilities  Download the R package elasticR and use it to access the cloud from local R sessions  Download the Java SDK and create your first analytical application using AWS and advanced tools for programming with data.  Get in touch with me to explore potential partnerships and collaborations
  • 23. 2121 Contact details  Karim Chine  karim.chine@cloudera.co.uk
  • 24. Watch the video with slide synchronization on InfoQ.com! http://www.infoq.com/presentations/elastic-r- data