SlideShare a Scribd company logo
1 of 68
REPRODUCIBLE RESEARCH AT
SCALE WITH APACHE SPARK
AND ZEPPELIN NOTEBOOK
CAROLYN DUBY
SOLUTIONS ENGINEER, NORTHEAST
HORTONWORKS
@ODSC
OPEN
DATA
SCIENCE
CONFERENCE
Boston | May 3-5th
ABOUT CAROLYN DUBY
• Big Data Solutions Architect
• High performance data intensive systems
• Data science
• ScB ScM Computer Science, Brown University
• LinkedIn: https://www.linkedin.com/in/carolynduby/
• Twitter: @carolynduby Github: carolynduby
• Hortonworks
• Innovation through data
• Enterprise ready, 100% open source, modern data platforms
• Engineering, Technical Support, Professional Services, Training
https://www.meetup.com/futureofdata-
boston/
AGENDA
• What is Reproducible Research? Why do it?
• What does it take to do Reproducible Research at scale?
• Example with Apache Zeppelin and Spark
REPRODUCIBLE RESEARCH
• Complete details of data analysis
methods yielding conclusions
• Replication of research on
independently collected data
• Gold standard
BENEFITS
• Individual productivity
• Effective peer review
• Answer questions more quickly
• Correct errors
• Apply methods to other experiments
• Increased quality and respect for results
• Justify business decisions
CHALLENGES
• Large data sets
• Complex analysis
• Data lineage
• Streaming data
• Limited space in publications
HOW TO DO REPRODUCIBLE RESEARCH
• Define Platform
• Record all versions of analysis software and installation procedures
• Analyze Data
• Record all commands to acquire, clean, organize, analyze
• Store intermediate results
• Version control
• Share Methods and Results
• Publish
• Share full details
REPRODUCIBLE RESEARCH WITH APACHE
OPEN SOURCE
• Apache Spark version 2.1
• Cleaning and analysis of large data sets
• http://spark.apache.org
• Apache Zeppelin Notebook 0.7.0
• Capture automated commands
• Visualize data for exploration and results
• https://zeppelin.apache.org
APACHE SPARK
• Distributed processing efficiently crunches large data sets
• Optimized
• Horizontally scalable with multi tenancy
• Fault tolerant
• One platform for streaming, cleaning, analyzing
• Elegant APIs – Scala, Python, Java, R
• Many data source connectors – file system, HDFS, Hive,
Phoenix, S3, etc
SPARK LIBRARIES
• Same API for all data sources
• SQL - http://spark.apache.org/sql/
• Access structured data and combine with other sources
• MLLIB - http://spark.apache.org/mllib/
• Machine learning for training models and predicting
• GraphX - http://spark.apache.org/graphx/
• Connectivity algorithms
• Streaming - http://spark.apache.org/streaming/
• Complex event processing and data ingest
ZEPPELIN
• Notebook
• Combine mark down, shell, spark, sql commands in same notebook
• Easily integrate with Spark in different languages
• Visualize data using graphs and pivot charts
• Share notebooks or paragraphs
ARCHITECTURE
Spark Driver
Zeppelin Spark
Application
Master
YARN container
Spark Executor
YARN container
Task Task
Spark Executor
YARN container
Task Task
Spark Executor
YARN container
Task Task
Client Browser
GETTING STARTED
• Use a distribution
• Curated set of compatible open source projects
• Sandbox - single node cluster in VM or Azure
• https://hortonworks.com/products/sandbox/
• Hortonworks Community Connection
• http://community.hortonworks.com
• On premise
• Use Apache Ambari to manage on premise physical hardware
• Cloud
• Automated provisioning with Cloudbreak (https://github.com/sequenceiq/cloudbreak)
• AWS, Azure, Google Cloud
ZEPPELIN BASICS
• Notes are composed of paragraphs
• Paragraph contains code or markdown
• Specify interpreter - % <interpreter name> or blank for default
• Enter commands
• Click play button to run code on cluster
• Results display in paragraph
• Code and results can be shown or hidden
Create/open
Note
Note tools
Paragraph
tools
User and note
configuration
Markdown
Interpreter (%md)
(editor hidden)
Shell
Interpreter (%sh)
(editor shown)
MARKDOWN
# headers
%md
hyperlink
show/hide
editor
run paragraph
run all paragraphs
block quote
EXAMPLE
• Crimes in Chicago Kaggle
Dataset
• Interesting opportunities
for time series and
prediction
https://www.kaggle.com/currie32/crimes-in-chicago
DATA PIPELINE
Acquire
Kaggle
Common Store
Raw CSV
zip
Clean ORC
Clean
Explore
Analyze
OPTIMIZING DATA CLEANING
• Keep a raw copy
• Web sites go away, remove data, change links and interfaces
• Store the clean data
• Saves time each time you analyze
• Use a standard format (Optimized Row Columnar(ORC),
parquet, etc)
• Query data with hive
• Shared location if security and privacy requirements allow
• Collaborate by sharing data with others
ACQUIRE DATASET
Acquire
Kaggle
Common Store
Raw CSV
zip
Clean ORC
Clean
Explore
Analyze
DOCUMENTING PLATFORM PREPARATION
DOCUMENT VERSIONS
• %sh interpreter
• Bash shell
• Show
intermediate
results for debug
CLEAN DATASET
Acquire
Kaggle
Common Store
Raw CSV
zip
Clean ORC
Clean
Explore
Analyze
Switching
To spark
Scala
code
SPARK IS FAST BUT LAZY
• Transformations
• Specify which data to read
• Modify data
• Actions
• Show data
• Write data
Header and
Case data on
Same CSV line
Apply numeric
types
On clean data
Add some
columns to make
aggregations
easier
Table for SQL
Save clean
data as ORC
EXPLORE DATASET
Acquire
Kaggle
Common Store
Raw CSV
zip
Clean ORC
Clean
Explore
Analyze
Read clean
data and
create table
Specify query
Select visualization
Configure visualization
X
Y
Hover to see
values
OTHER INTERPRETERS
• Matplotlib with pyspark
• Angular – maps
ANGULAR WITH INPUT
https://community.hortonworks.com/articles/75834/using-angular-within-apache-
zeppelin-to-create-cus.html
ANALYZE DATASET
Acquire
Kaggle
Common Store
Raw CSV
zip
Clean ORC
Clean
Explore
Analyze
CREATE DATA TO FIT POISSON
FIT POISSON MODEL
Evaluate
Model
MODEL PIPELINES
• https://spark.apache.org/docs/2.1.0/ml-pipeline.html
TRANFORME
RTransformers Estimator
Pipeline
Model
Training
data
Test
data
Predictio
ns
PYTHON – LOOKS LIKE SCALA
R
TIPS AND TRICKS
• Use val for variables used across paragraphs
• Vars can yield unpredictable results when run out of order
• Break up big notebooks
• Store intermediate results
• Avoid reloading and recalculating the same values
• Verify your notebook by running all paragraphs
SHARING NOTEBOOKS
• Share link to notebook or paragraph
• Readers access your Zeppelin server
• Use logins and permissions
• Export to JSON and save to shared file
• Readers get JSON from shared file (github, cloud, etc)
• Import to their Zeppelin server
• Sync your to Zeppelin Hub (https://www.zeppelinhub.com)
• Share Zeppelin Hub link with readers
• Free version for small teams
REUSING NOTEBOOKS
• Clone notebook
• Copy code from notebook
• Build libraries for use in notebooks
VERSIONING NOTEBOOKS
• Track changes to notebook code or text
• Go back to a previously known good version
• Compare versions to see differences
CONFIGURE ZEPPELIN VERSION CONTROL
1
2
3
4
5Set storage to
GitNotebookRepo in
Zeppelin-env.xml
Restart Zeppelin server
SAVE NOTE VERSION
Enter version
comment and click
Commit
VIEWING VERSION HISTORY AND PREVIOUS
VERSIONS
Pull down the list of
versions
Select a version
Zeppelin shows
content for that
version
Head goes to the
latest version
GIT REPO ON ZEPPELIN SERVER
Zeppelin creates git repo
In notebook directory
QUESTIONS AND THANK YOU!
REFERENCES
REPRODUCIBLE RESEARCH
• Sandve GK, Nekrutenko A, Taylor J, Hovig E (2013) Ten Simple
Rules for Reproducible Computational Research. PLoS Comput
Biol 9(10): e1003285. doi:10.1371/journal.pcbi.1003285
• http://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.p
cbi.1003285&type=printable
ZEPPELIN AND SPARK
• Spark
• https://dzone.com/articles/try-the-latest-innovations-in-apache-
spark-and-apa
• https://hortonworks.com/hadoop-tutorial/learning-spark-zeppelin/
• https://spark.apache.org/docs/2.1.0/ml-pipeline.html
• Example Notebooks
• https://github.com/hortonworks-gallery/zeppelin-notebooks
ZEPPELIN INTERPRETERS
• Markdown syntax
• http://daringfireball.net/projects/markdown/syntax
EXAMPLE
• Chicago Crimes Data Set
• https://www.kaggle.com/currie32/crimes-in-Chicago
• Example notebooks
• https://github.com/carolynduby/ODSC2017

More Related Content

What's hot

Building a Data Warehouse for Business Analytics using Spark SQL-(Blagoy Kalo...
Building a Data Warehouse for Business Analytics using Spark SQL-(Blagoy Kalo...Building a Data Warehouse for Business Analytics using Spark SQL-(Blagoy Kalo...
Building a Data Warehouse for Business Analytics using Spark SQL-(Blagoy Kalo...Spark Summit
 
Zeppelin at Twitter
Zeppelin at TwitterZeppelin at Twitter
Zeppelin at TwitterPrasad Wagle
 
Announcing Databricks Cloud (Spark Summit 2014)
Announcing Databricks Cloud (Spark Summit 2014)Announcing Databricks Cloud (Spark Summit 2014)
Announcing Databricks Cloud (Spark Summit 2014)Databricks
 
H2O World - H2O Rains with Databricks Cloud
H2O World - H2O Rains with Databricks CloudH2O World - H2O Rains with Databricks Cloud
H2O World - H2O Rains with Databricks CloudSri Ambati
 
Writing Continuous Applications with Structured Streaming PySpark API
Writing Continuous Applications with Structured Streaming PySpark APIWriting Continuous Applications with Structured Streaming PySpark API
Writing Continuous Applications with Structured Streaming PySpark APIDatabricks
 
Open Source Big Data Ingestion - Without the Heartburn!
Open Source Big Data Ingestion - Without the Heartburn!Open Source Big Data Ingestion - Without the Heartburn!
Open Source Big Data Ingestion - Without the Heartburn!Pat Patterson
 
Informational Referential Integrity Constraints Support in Apache Spark with ...
Informational Referential Integrity Constraints Support in Apache Spark with ...Informational Referential Integrity Constraints Support in Apache Spark with ...
Informational Referential Integrity Constraints Support in Apache Spark with ...Databricks
 
Elasticsearch + Cascading for Scalable Log Processing
Elasticsearch + Cascading for Scalable Log ProcessingElasticsearch + Cascading for Scalable Log Processing
Elasticsearch + Cascading for Scalable Log ProcessingCascading
 
Databricks @ Strata SJ
Databricks @ Strata SJDatabricks @ Strata SJ
Databricks @ Strata SJDatabricks
 
Dr. Elephant for Monitoring and Tuning Apache Spark Jobs on Hadoop with Carl ...
Dr. Elephant for Monitoring and Tuning Apache Spark Jobs on Hadoop with Carl ...Dr. Elephant for Monitoring and Tuning Apache Spark Jobs on Hadoop with Carl ...
Dr. Elephant for Monitoring and Tuning Apache Spark Jobs on Hadoop with Carl ...Databricks
 
Spark Summit EU talk by Stephan Kessler
Spark Summit EU talk by Stephan KesslerSpark Summit EU talk by Stephan Kessler
Spark Summit EU talk by Stephan KesslerSpark Summit
 
MLflow: Infrastructure for a Complete Machine Learning Life Cycle
MLflow: Infrastructure for a Complete Machine Learning Life CycleMLflow: Infrastructure for a Complete Machine Learning Life Cycle
MLflow: Infrastructure for a Complete Machine Learning Life CycleDatabricks
 
How We Optimize Spark SQL Jobs With parallel and sync IO
How We Optimize Spark SQL Jobs With parallel and sync IOHow We Optimize Spark SQL Jobs With parallel and sync IO
How We Optimize Spark SQL Jobs With parallel and sync IODatabricks
 
H2O Rains with Databricks Cloud - Parisoma SF
H2O Rains with Databricks Cloud - Parisoma SFH2O Rains with Databricks Cloud - Parisoma SF
H2O Rains with Databricks Cloud - Parisoma SFSri Ambati
 
The Revolution Will be Streamed
The Revolution Will be StreamedThe Revolution Will be Streamed
The Revolution Will be StreamedDatabricks
 
Using Databricks as an Analysis Platform
Using Databricks as an Analysis PlatformUsing Databricks as an Analysis Platform
Using Databricks as an Analysis PlatformDatabricks
 
Sa introduction to big data pipelining with cassandra &amp; spark west mins...
Sa introduction to big data pipelining with cassandra &amp; spark   west mins...Sa introduction to big data pipelining with cassandra &amp; spark   west mins...
Sa introduction to big data pipelining with cassandra &amp; spark west mins...Simon Ambridge
 
Apache® Spark™ MLlib: From Quick Start to Scikit-Learn
Apache® Spark™ MLlib: From Quick Start to Scikit-LearnApache® Spark™ MLlib: From Quick Start to Scikit-Learn
Apache® Spark™ MLlib: From Quick Start to Scikit-LearnDatabricks
 
Spark Summit EU talk by Christos Erotocritou
Spark Summit EU talk by Christos ErotocritouSpark Summit EU talk by Christos Erotocritou
Spark Summit EU talk by Christos ErotocritouSpark Summit
 
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Spark Summit
 

What's hot (20)

Building a Data Warehouse for Business Analytics using Spark SQL-(Blagoy Kalo...
Building a Data Warehouse for Business Analytics using Spark SQL-(Blagoy Kalo...Building a Data Warehouse for Business Analytics using Spark SQL-(Blagoy Kalo...
Building a Data Warehouse for Business Analytics using Spark SQL-(Blagoy Kalo...
 
Zeppelin at Twitter
Zeppelin at TwitterZeppelin at Twitter
Zeppelin at Twitter
 
Announcing Databricks Cloud (Spark Summit 2014)
Announcing Databricks Cloud (Spark Summit 2014)Announcing Databricks Cloud (Spark Summit 2014)
Announcing Databricks Cloud (Spark Summit 2014)
 
H2O World - H2O Rains with Databricks Cloud
H2O World - H2O Rains with Databricks CloudH2O World - H2O Rains with Databricks Cloud
H2O World - H2O Rains with Databricks Cloud
 
Writing Continuous Applications with Structured Streaming PySpark API
Writing Continuous Applications with Structured Streaming PySpark APIWriting Continuous Applications with Structured Streaming PySpark API
Writing Continuous Applications with Structured Streaming PySpark API
 
Open Source Big Data Ingestion - Without the Heartburn!
Open Source Big Data Ingestion - Without the Heartburn!Open Source Big Data Ingestion - Without the Heartburn!
Open Source Big Data Ingestion - Without the Heartburn!
 
Informational Referential Integrity Constraints Support in Apache Spark with ...
Informational Referential Integrity Constraints Support in Apache Spark with ...Informational Referential Integrity Constraints Support in Apache Spark with ...
Informational Referential Integrity Constraints Support in Apache Spark with ...
 
Elasticsearch + Cascading for Scalable Log Processing
Elasticsearch + Cascading for Scalable Log ProcessingElasticsearch + Cascading for Scalable Log Processing
Elasticsearch + Cascading for Scalable Log Processing
 
Databricks @ Strata SJ
Databricks @ Strata SJDatabricks @ Strata SJ
Databricks @ Strata SJ
 
Dr. Elephant for Monitoring and Tuning Apache Spark Jobs on Hadoop with Carl ...
Dr. Elephant for Monitoring and Tuning Apache Spark Jobs on Hadoop with Carl ...Dr. Elephant for Monitoring and Tuning Apache Spark Jobs on Hadoop with Carl ...
Dr. Elephant for Monitoring and Tuning Apache Spark Jobs on Hadoop with Carl ...
 
Spark Summit EU talk by Stephan Kessler
Spark Summit EU talk by Stephan KesslerSpark Summit EU talk by Stephan Kessler
Spark Summit EU talk by Stephan Kessler
 
MLflow: Infrastructure for a Complete Machine Learning Life Cycle
MLflow: Infrastructure for a Complete Machine Learning Life CycleMLflow: Infrastructure for a Complete Machine Learning Life Cycle
MLflow: Infrastructure for a Complete Machine Learning Life Cycle
 
How We Optimize Spark SQL Jobs With parallel and sync IO
How We Optimize Spark SQL Jobs With parallel and sync IOHow We Optimize Spark SQL Jobs With parallel and sync IO
How We Optimize Spark SQL Jobs With parallel and sync IO
 
H2O Rains with Databricks Cloud - Parisoma SF
H2O Rains with Databricks Cloud - Parisoma SFH2O Rains with Databricks Cloud - Parisoma SF
H2O Rains with Databricks Cloud - Parisoma SF
 
The Revolution Will be Streamed
The Revolution Will be StreamedThe Revolution Will be Streamed
The Revolution Will be Streamed
 
Using Databricks as an Analysis Platform
Using Databricks as an Analysis PlatformUsing Databricks as an Analysis Platform
Using Databricks as an Analysis Platform
 
Sa introduction to big data pipelining with cassandra &amp; spark west mins...
Sa introduction to big data pipelining with cassandra &amp; spark   west mins...Sa introduction to big data pipelining with cassandra &amp; spark   west mins...
Sa introduction to big data pipelining with cassandra &amp; spark west mins...
 
Apache® Spark™ MLlib: From Quick Start to Scikit-Learn
Apache® Spark™ MLlib: From Quick Start to Scikit-LearnApache® Spark™ MLlib: From Quick Start to Scikit-Learn
Apache® Spark™ MLlib: From Quick Start to Scikit-Learn
 
Spark Summit EU talk by Christos Erotocritou
Spark Summit EU talk by Christos ErotocritouSpark Summit EU talk by Christos Erotocritou
Spark Summit EU talk by Christos Erotocritou
 
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
 

Similar to ODSC East 2017 - Reproducible Research at Scale with Apache Zeppelin and Spark

Data Science at Scale with Apache Spark and Zeppelin Notebook
Data Science at Scale with Apache Spark and Zeppelin NotebookData Science at Scale with Apache Spark and Zeppelin Notebook
Data Science at Scale with Apache Spark and Zeppelin NotebookCarolyn Duby
 
If You Have The Content, Then Apache Has The Technology!
If You Have The Content, Then Apache Has The Technology!If You Have The Content, Then Apache Has The Technology!
If You Have The Content, Then Apache Has The Technology!gagravarr
 
Scalable Preservation Workflows
Scalable Preservation WorkflowsScalable Preservation Workflows
Scalable Preservation WorkflowsSCAPE Project
 
Creating Reusable Geospatial Pipelines
Creating Reusable Geospatial PipelinesCreating Reusable Geospatial Pipelines
Creating Reusable Geospatial PipelinesDatabricks
 
Advances in Scientific Workflow Environments
Advances in Scientific Workflow EnvironmentsAdvances in Scientific Workflow Environments
Advances in Scientific Workflow EnvironmentsCarole Goble
 
Presto: Fast SQL on Everything
Presto: Fast SQL on EverythingPresto: Fast SQL on Everything
Presto: Fast SQL on EverythingDavid Phillips
 
High Performance Machine Learning in R with H2O
High Performance Machine Learning in R with H2OHigh Performance Machine Learning in R with H2O
High Performance Machine Learning in R with H2OSri Ambati
 
Scala and Spark are Ideal for Big Data - Data Science Pop-up Seattle
Scala and Spark are Ideal for Big Data - Data Science Pop-up SeattleScala and Spark are Ideal for Big Data - Data Science Pop-up Seattle
Scala and Spark are Ideal for Big Data - Data Science Pop-up SeattleDomino Data Lab
 
Delivering Insights from 20M+ Smart Homes with 500M+ Devices
Delivering Insights from 20M+ Smart Homes with 500M+ DevicesDelivering Insights from 20M+ Smart Homes with 500M+ Devices
Delivering Insights from 20M+ Smart Homes with 500M+ DevicesDatabricks
 
Intro to Machine Learning with H2O and AWS
Intro to Machine Learning with H2O and AWSIntro to Machine Learning with H2O and AWS
Intro to Machine Learning with H2O and AWSSri Ambati
 
EAD3 Progress Report 2014-08-13
EAD3 Progress Report 2014-08-13EAD3 Progress Report 2014-08-13
EAD3 Progress Report 2014-08-13Michael Rush
 
Cross Site Collection Navigation with SPFX, PowerShell PnP, PnP-JS, Office UI
Cross Site Collection Navigation with SPFX, PowerShell PnP, PnP-JS, Office UICross Site Collection Navigation with SPFX, PowerShell PnP, PnP-JS, Office UI
Cross Site Collection Navigation with SPFX, PowerShell PnP, PnP-JS, Office UIThomas Daly
 
Reproducibility - The myths and truths of pipeline bioinformatics
Reproducibility - The myths and truths of pipeline bioinformaticsReproducibility - The myths and truths of pipeline bioinformatics
Reproducibility - The myths and truths of pipeline bioinformaticsSimon Cockell
 
New Developments in H2O: April 2017 Edition
New Developments in H2O: April 2017 EditionNew Developments in H2O: April 2017 Edition
New Developments in H2O: April 2017 EditionSri Ambati
 
Transformation Processing Smackdown; Spark vs Hive vs Pig
Transformation Processing Smackdown; Spark vs Hive vs PigTransformation Processing Smackdown; Spark vs Hive vs Pig
Transformation Processing Smackdown; Spark vs Hive vs PigLester Martin
 
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...Michael Rys
 
Apache Spark Fundamentals
Apache Spark FundamentalsApache Spark Fundamentals
Apache Spark FundamentalsZahra Eskandari
 
Fabric Data Factory Pipeline Copy Perf Tips.pptx
Fabric Data Factory Pipeline Copy Perf Tips.pptxFabric Data Factory Pipeline Copy Perf Tips.pptx
Fabric Data Factory Pipeline Copy Perf Tips.pptxMark Kromer
 
Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...
Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...
Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...Oleksiy Panchenko
 

Similar to ODSC East 2017 - Reproducible Research at Scale with Apache Zeppelin and Spark (20)

Data Science at Scale with Apache Spark and Zeppelin Notebook
Data Science at Scale with Apache Spark and Zeppelin NotebookData Science at Scale with Apache Spark and Zeppelin Notebook
Data Science at Scale with Apache Spark and Zeppelin Notebook
 
If You Have The Content, Then Apache Has The Technology!
If You Have The Content, Then Apache Has The Technology!If You Have The Content, Then Apache Has The Technology!
If You Have The Content, Then Apache Has The Technology!
 
Scalable Preservation Workflows
Scalable Preservation WorkflowsScalable Preservation Workflows
Scalable Preservation Workflows
 
Creating Reusable Geospatial Pipelines
Creating Reusable Geospatial PipelinesCreating Reusable Geospatial Pipelines
Creating Reusable Geospatial Pipelines
 
Advances in Scientific Workflow Environments
Advances in Scientific Workflow EnvironmentsAdvances in Scientific Workflow Environments
Advances in Scientific Workflow Environments
 
Presto: Fast SQL on Everything
Presto: Fast SQL on EverythingPresto: Fast SQL on Everything
Presto: Fast SQL on Everything
 
High Performance Machine Learning in R with H2O
High Performance Machine Learning in R with H2OHigh Performance Machine Learning in R with H2O
High Performance Machine Learning in R with H2O
 
Scala and Spark are Ideal for Big Data - Data Science Pop-up Seattle
Scala and Spark are Ideal for Big Data - Data Science Pop-up SeattleScala and Spark are Ideal for Big Data - Data Science Pop-up Seattle
Scala and Spark are Ideal for Big Data - Data Science Pop-up Seattle
 
Delivering Insights from 20M+ Smart Homes with 500M+ Devices
Delivering Insights from 20M+ Smart Homes with 500M+ DevicesDelivering Insights from 20M+ Smart Homes with 500M+ Devices
Delivering Insights from 20M+ Smart Homes with 500M+ Devices
 
Intro to Machine Learning with H2O and AWS
Intro to Machine Learning with H2O and AWSIntro to Machine Learning with H2O and AWS
Intro to Machine Learning with H2O and AWS
 
EAD3 Progress Report 2014-08-13
EAD3 Progress Report 2014-08-13EAD3 Progress Report 2014-08-13
EAD3 Progress Report 2014-08-13
 
Cross Site Collection Navigation with SPFX, PowerShell PnP, PnP-JS, Office UI
Cross Site Collection Navigation with SPFX, PowerShell PnP, PnP-JS, Office UICross Site Collection Navigation with SPFX, PowerShell PnP, PnP-JS, Office UI
Cross Site Collection Navigation with SPFX, PowerShell PnP, PnP-JS, Office UI
 
Reproducibility - The myths and truths of pipeline bioinformatics
Reproducibility - The myths and truths of pipeline bioinformaticsReproducibility - The myths and truths of pipeline bioinformatics
Reproducibility - The myths and truths of pipeline bioinformatics
 
New Developments in H2O: April 2017 Edition
New Developments in H2O: April 2017 EditionNew Developments in H2O: April 2017 Edition
New Developments in H2O: April 2017 Edition
 
Transformation Processing Smackdown; Spark vs Hive vs Pig
Transformation Processing Smackdown; Spark vs Hive vs PigTransformation Processing Smackdown; Spark vs Hive vs Pig
Transformation Processing Smackdown; Spark vs Hive vs Pig
 
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
 
Apache Spark Fundamentals
Apache Spark FundamentalsApache Spark Fundamentals
Apache Spark Fundamentals
 
Fabric Data Factory Pipeline Copy Perf Tips.pptx
Fabric Data Factory Pipeline Copy Perf Tips.pptxFabric Data Factory Pipeline Copy Perf Tips.pptx
Fabric Data Factory Pipeline Copy Perf Tips.pptx
 
SolrCloud on Hadoop
SolrCloud on HadoopSolrCloud on Hadoop
SolrCloud on Hadoop
 
Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...
Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...
Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...
 

Recently uploaded

毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
While-For-loop in python used in college
While-For-loop in python used in collegeWhile-For-loop in python used in college
While-For-loop in python used in collegessuser7a7cd61
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...GQ Research
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfchwongval
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理e4aez8ss
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanMYRABACSAFRA2
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 

Recently uploaded (20)

毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
While-For-loop in python used in college
While-For-loop in python used in collegeWhile-For-loop in python used in college
While-For-loop in python used in college
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdf
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population Mean
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 

ODSC East 2017 - Reproducible Research at Scale with Apache Zeppelin and Spark