SlideShare ist ein Scribd-Unternehmen logo
1 von 54
Downloaden Sie, um offline zu lesen
Data Science lifecycle with Apache Zeppelin
http://zeppelin.apache.org
Moon
Creator of Apache Zeppelin
Co-founder NFLabs
Zeppelin
2012. 12 Data analytics solution based on AMP Lab Spark/Shark
Zeppelin
2012. 12 Data analytics solution based on AMP Lab Spark/Shark
2013. 10 Opensource interactive analytics feature as ‘Zeppelin’
2013. 10 2014. 08
Zeppelin
2012. 12 Data analytics solution based on AMP Lab Spark/Shark
2013. 10 Opensource interactive analytics feature as ‘Zeppelin’
2014. 12 ASF incubation
Incubation Status http://incubator.apache.org/projects/zeppelin.html
Zeppelin
2012. 12 Data analytics solution based on AMP Lab Spark/Shark
2013. 10 Opensource interactive analytics feature as ‘Zeppelin’
2014. 12 ASF incubation
2016. 10 157 Contributors world wide
2071 Stars on github repo
6 Releases
One of the most popular project in ASF
Collect
ETL /
Process
Analysis
Report
Data
Product
Life cycle of big data
Data
Engineer
Data
Scientist
Business user
Customer
Zeppelin
A web-based notebook that enables interactive data analytics. You can
make beautiful data-driven, interactive and collaborative documents with
SQL, Scala and more.
Zeppelin
JDBC
Markdown > _ Shell
Interpreter : pluggable layer for language / processing backend integration
20+ interpreters are supported officially
2016. 03. Interpreters in Zeppelin source tree. Does not include 3rd party interpreters
Zeppelin
Interpreter : pluggable layer for language / processing backend integration
Zeppelin
Interpreter : Easy to extend
public abstract class Interpreter {
public void open();
public void close();
public InterpreterResult interpret(String st, InterpreterContext context);
public void cancel(InterpreterContext context);
public int getProgress(InterpreterContext context);
public List<String> completion(String buf, int cursor);
public FormType getFormType();
public Scheduler getScheduler();
}
{Must have
{Good to have
Advanced {
Zeppelin
Notebook Repo : pluggable layer for notebook persistence
5+ Notebook repos are supported officially
2016. 03. Notebook repos in Zeppelin source tree. Does not include 3rd party interpreters
ZeppelinHub
Zeppelin
Notebook Repo : Easy to extend
public interface NotebookRepo {
public List<NoteInfo> list() throws IOException;
public Note get(String noteId) throws IOException;
public void save(Note note) throws IOException;
public void remove(String noteId) throws IOException;
public void checkpoint(String noteId, String checkPointName) throws IOException;
public void close();
}
Zeppelin
Visualizations : 6 Built-in visualizations comes with pivot
Table Bar Pie Area Line Scatter
Free to draw any customized visualizations inside of notebook
…
He liumHe
2
Platform for data analytics application that
makes visualization pluggable and more.
http://issues.apache.org/jira/browse/ZEPPELIN-533
https://cwiki.apache.org/confluence/display/ZEPPELIN/Helium+proposal
Proposal
Umbrella issue
Makes Zeppelin fly!
He liumHe
2
RESTful API Websocket
Interpreter Notebook Storage
Spark
Flink
Geode
JDBC
…
FileSystem
AmazonS3
Git
…
ZeppelinServer
Interpreters and Notebook storage are pluggable
He liumHe
2
Interpreter Notebook Storage
Spark
Flink
Geode
JDBC
…
FileSystem
AmazonS3
Git
…
ZeppelinServer
Visualizations
Map
WordCloud
…
We want visualization be pluggable
He liumHe
2
Interpreter Notebook Storage
Spark
Flink
Geode
JDBC …
FileSystem
AmazonS3
Git
…
Application
Visualizations
Map
WordCloud
…
Resource Pool
SparkContext Flink Environment JDBC connection …
Analytics
…
…
User object
Extend pluggable visualization to pluggable analytics application
Helium Application: Easy to extend
public abstract class Application {
public Application(ApplicationContext context);
public abstract void run(ResourceSet args);
public abstract void unload();
}
He liumHe
2
Launcher: Suggest application according to data type in ResourcePool
He liumHe
2
& Enterprise
Jongyoul Lee
PMC of Apache Zeppelin
Software Development Engineer at NFLabs
& Enterprise
More than 1000 employers use Apache Zeppelin
Supports Apache Zeppelin as an internal service
Recommendation team uses Apache Zeppelin
Monitors their infrastructures via Apache Zeppelin
& Enterprise
& Enterprise History
~ 0.6
• NOTHING!!!
0.6.x
• Authentication & Authorization
• Note level permission
• Note level isolation
• Partially supported by Livy
& Enterprise Future
0.7.0
• Enterprise Support
• Multi users environment
• Impersonation on Spark/JDBC interpreter
• Job management
• Interpreter
• Improvement on JDBC/Python interpreter
• Frontend performance improvement
• Pluggable visualization
& Enterprise Future
0.7.0
• Enterprise Support
• Multi users environment
• Impersonation on Spark/JDBC interpreter
• Job management
• Interpreter
• Improvement on JDBC/Python interpreter
• Frontend performance improvement
• Pluggable visualization
& Enterprise
RESTful API Websocket
Interpreter Notebook Storage
Spark
Flink
Geode
JDBC
…
FileSystem
AmazonS3
Git
…
ZeppelinServer
Multi-tenancy
& Enterprise
RESTful API Websocket
Interpreter Notebook Storage
Spark
Flink
Geode
JDBC
…
FileSystem
AmazonS3
Git
…
ZeppelinServer
NO USER
Multi-tenancy
Shared, Isolated, Scoped
& Enterprise
ZeppelinServer
SparkInterpreter
Run P1 on NoteA
Run SparkInterpreter for P1
User1
Multi-tenancy
& Enterprise
ZeppelinServer
SparkInterpreter
Run P1 on NoteA
Run SparkInterpreter for P1
User1
User2
Run P2 on NoteB Run SparkInterpreter for P2
Multi-tenancy
& Enterprise
• Originally implemented
• Pros
• Simple structure
• Predictable behavior
• Cons
• All resources shared
• Interference among users
Multi-tenancy
Shared
& Enterprise
ZeppelinServer
SparkInterpreter
Run P1 on NoteA
Run SparkInterpreter for P1
User1
User2
Run P2 on NoteB
Run SparkInterpreter for P2 SparkInterpreter
Multi-tenancy
& Enterprise
• Pros
• No pending
• No resources shared
• Cons
• Lots of memory
• Inefficiency of using memory
• Limited by resources
Multi-tenancy
Isolated
& Enterprise
ZeppelinServer
SparkInterpreter
Run P1 on NoteA
Run SparkInterpreter for P1
User1
User2
Run P2 on NoteB
Run SparkInterpreter for P2 SparkInterpreter
Multi-tenancy
& Enterprise
ZeppelinServer
JDBCInterpreter
Run P2 on NoteA
Run SparkInterpreter for P2
User1
User2
Run P3 on NoteB
Run SparkInterpreter for P3 JDBCInterpreter
Multi-tenancy
& Enterprise
ZeppelinServer
JDBCInterpreter
Run P2 on NoteA
Run SparkInterpreter for P2
User1
User2
Run P3 on NoteB Run SparkInterpreter for P3
Multi-tenancy
JDBCInstance
User1
JDBCInstance
User2
& Enterprise
• Pros
• Less memory
• Some resources Isolated
• Cons
• Some resources shared
• Big single process
Multi-tenancy
Scoped
& Enterprise Future
0.7.0
• Enterprise Support
• Multi users environment
• Impersonation on Spark/JDBC interpreter
• Job management
• Interpreter
• Improvement on JDBC/Python interpreter
• Frontend performance improvement
• Pluggable visualization
& Enterprise Impersonation
What if all users use different credentials?
& Enterprise Impersonation
& Enterprise Impersonation
Credentials
• Already merged by Twitter at Mar. 2016
• Never used in any interpreter
& Enterprise Impersonation
• JDBC
• Set user and password in properties
• https://issues.apache.org/jira/browse/ZEPPELIN-1567
• Spark
• Adopt ugi.doAs()
• https://issues.apache.org/jira/browse/ZEPPELIN-1572
& Enterprise Impersonation
& Enterprise Future
0.7.0
• Enterprise Support
• Multi users environment
• Impersonation on Spark/JDBC interpreter
• Job management
• Interpreter
• Improvement on JDBC/Python interpreter
• Frontend performance improvement
• Pluggable visualization
& Enterprise Job mgmt
& Enterprise Future
0.7.0
• Enterprise Support
• Multi users environment
• Impersonation on Spark/JDBC interpreter
• Job management
• Interpreter
• Improvement on JDBC/Python interpreter
• Frontend performance improvement
• Pluggable visualization
• JDBC
• Connection pool
• Stabilization for BI
• Python
• Matplot library
• Support on python user
& Enterprise Interpreters
& Enterprise Future
0.7.0
• Enterprise Support
• Multi users environment
• Impersonation on Spark/JDBC interpreter
• Job management
• Interpreter
• Improvement on JDBC/Python interpreter
• Frontend performance improvement
• Pluggable visualization
• Frontend
• Fine-grained broadcast of WebSocket
• Betterment of rendering DOM
• Pluggable visualization
• lium
& Enterprise Frontend
He
2
Zeppelin
Homepage
http://zeppelin.apache.org/

Mailing list
users@zeppelin.apache.org
dev@zeppelin.apache.org
Issue tracker
https://issues.apache.org/jira/browse/ZEPPELIN
Github repository
http://github.com/apache/zeppelin
Join the community
Thank you
Moon soo Lee
moon@nflabs.com
https://twitter.com/issuefreaks
Jongyoul Lee
jongyoul@nflabs.com
https://twitter.com/madeng

Más contenido relacionado

Was ist angesagt?

Archiving, E-Discovery, and Supervision with Spark and Hadoop with Jordan Volz
Archiving, E-Discovery, and Supervision with Spark and Hadoop with Jordan VolzArchiving, E-Discovery, and Supervision with Spark and Hadoop with Jordan Volz
Archiving, E-Discovery, and Supervision with Spark and Hadoop with Jordan VolzDatabricks
 
Apache Spark and Apache Ignite: Where Fast Data Meets the IoT with Denis Magda
Apache Spark and Apache Ignite: Where Fast Data Meets the IoT with Denis MagdaApache Spark and Apache Ignite: Where Fast Data Meets the IoT with Denis Magda
Apache Spark and Apache Ignite: Where Fast Data Meets the IoT with Denis MagdaDatabricks
 
SparkR + Zeppelin
SparkR + ZeppelinSparkR + Zeppelin
SparkR + Zeppelinfelixcss
 
Apache Zeppelin Meetup Christian Tzolov 1/21/16
Apache Zeppelin Meetup Christian Tzolov 1/21/16 Apache Zeppelin Meetup Christian Tzolov 1/21/16
Apache Zeppelin Meetup Christian Tzolov 1/21/16 PivotalOpenSourceHub
 
Spark Summit EU talk by Jakub Hava
Spark Summit EU talk by Jakub HavaSpark Summit EU talk by Jakub Hava
Spark Summit EU talk by Jakub HavaSpark Summit
 
Spark-Streaming-as-a-Service with Kafka and YARN: Spark Summit East talk by J...
Spark-Streaming-as-a-Service with Kafka and YARN: Spark Summit East talk by J...Spark-Streaming-as-a-Service with Kafka and YARN: Spark Summit East talk by J...
Spark-Streaming-as-a-Service with Kafka and YARN: Spark Summit East talk by J...Spark Summit
 
Monitoring of GPU Usage with Tensorflow Models Using Prometheus
Monitoring of GPU Usage with Tensorflow Models Using PrometheusMonitoring of GPU Usage with Tensorflow Models Using Prometheus
Monitoring of GPU Usage with Tensorflow Models Using PrometheusDatabricks
 
Spark Summit EU talk by Yiannis Gkoufas
Spark Summit EU talk by Yiannis GkoufasSpark Summit EU talk by Yiannis Gkoufas
Spark Summit EU talk by Yiannis GkoufasSpark Summit
 
Helium makes Zeppelin fly!
Helium makes Zeppelin fly!Helium makes Zeppelin fly!
Helium makes Zeppelin fly!DataWorks Summit
 
Apache Spark At Apple with Sam Maclennan and Vishwanath Lakkundi
Apache Spark At Apple with Sam Maclennan and Vishwanath LakkundiApache Spark At Apple with Sam Maclennan and Vishwanath Lakkundi
Apache Spark At Apple with Sam Maclennan and Vishwanath LakkundiDatabricks
 
Spark Advanced Analytics NJ Data Science Meetup - Princeton University
Spark Advanced Analytics NJ Data Science Meetup - Princeton UniversitySpark Advanced Analytics NJ Data Science Meetup - Princeton University
Spark Advanced Analytics NJ Data Science Meetup - Princeton UniversityAlex Zeltov
 
Faster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on Hive
Faster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on HiveFaster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on Hive
Faster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on HiveDataWorks Summit/Hadoop Summit
 
Stream All Things—Patterns of Modern Data Integration with Gwen Shapira
Stream All Things—Patterns of Modern Data Integration with Gwen ShapiraStream All Things—Patterns of Modern Data Integration with Gwen Shapira
Stream All Things—Patterns of Modern Data Integration with Gwen ShapiraDatabricks
 
Migrating from Redshift to Spark at Stitch Fix: Spark Summit East talk by Sky...
Migrating from Redshift to Spark at Stitch Fix: Spark Summit East talk by Sky...Migrating from Redshift to Spark at Stitch Fix: Spark Summit East talk by Sky...
Migrating from Redshift to Spark at Stitch Fix: Spark Summit East talk by Sky...Spark Summit
 
SparkOscope: Enabling Apache Spark Optimization through Cross Stack Monitorin...
SparkOscope: Enabling Apache Spark Optimization through Cross Stack Monitorin...SparkOscope: Enabling Apache Spark Optimization through Cross Stack Monitorin...
SparkOscope: Enabling Apache Spark Optimization through Cross Stack Monitorin...Databricks
 
Whirlpools in the Stream with Jayesh Lalwani
 Whirlpools in the Stream with Jayesh Lalwani Whirlpools in the Stream with Jayesh Lalwani
Whirlpools in the Stream with Jayesh LalwaniDatabricks
 
Bringing complex event processing to Spark streaming
Bringing complex event processing to Spark streamingBringing complex event processing to Spark streaming
Bringing complex event processing to Spark streamingDataWorks Summit
 
Spark Summit EU talk by Ruben Pulido Behar Veliqi
Spark Summit EU talk by Ruben Pulido Behar VeliqiSpark Summit EU talk by Ruben Pulido Behar Veliqi
Spark Summit EU talk by Ruben Pulido Behar VeliqiSpark Summit
 

Was ist angesagt? (20)

Archiving, E-Discovery, and Supervision with Spark and Hadoop with Jordan Volz
Archiving, E-Discovery, and Supervision with Spark and Hadoop with Jordan VolzArchiving, E-Discovery, and Supervision with Spark and Hadoop with Jordan Volz
Archiving, E-Discovery, and Supervision with Spark and Hadoop with Jordan Volz
 
Apache Spark and Apache Ignite: Where Fast Data Meets the IoT with Denis Magda
Apache Spark and Apache Ignite: Where Fast Data Meets the IoT with Denis MagdaApache Spark and Apache Ignite: Where Fast Data Meets the IoT with Denis Magda
Apache Spark and Apache Ignite: Where Fast Data Meets the IoT with Denis Magda
 
SparkR + Zeppelin
SparkR + ZeppelinSparkR + Zeppelin
SparkR + Zeppelin
 
Apache Zeppelin Meetup Christian Tzolov 1/21/16
Apache Zeppelin Meetup Christian Tzolov 1/21/16 Apache Zeppelin Meetup Christian Tzolov 1/21/16
Apache Zeppelin Meetup Christian Tzolov 1/21/16
 
Spark Summit EU talk by Jakub Hava
Spark Summit EU talk by Jakub HavaSpark Summit EU talk by Jakub Hava
Spark Summit EU talk by Jakub Hava
 
Spark-Streaming-as-a-Service with Kafka and YARN: Spark Summit East talk by J...
Spark-Streaming-as-a-Service with Kafka and YARN: Spark Summit East talk by J...Spark-Streaming-as-a-Service with Kafka and YARN: Spark Summit East talk by J...
Spark-Streaming-as-a-Service with Kafka and YARN: Spark Summit East talk by J...
 
Monitoring of GPU Usage with Tensorflow Models Using Prometheus
Monitoring of GPU Usage with Tensorflow Models Using PrometheusMonitoring of GPU Usage with Tensorflow Models Using Prometheus
Monitoring of GPU Usage with Tensorflow Models Using Prometheus
 
Spark Summit EU talk by Yiannis Gkoufas
Spark Summit EU talk by Yiannis GkoufasSpark Summit EU talk by Yiannis Gkoufas
Spark Summit EU talk by Yiannis Gkoufas
 
Helium makes Zeppelin fly!
Helium makes Zeppelin fly!Helium makes Zeppelin fly!
Helium makes Zeppelin fly!
 
Apache Spark At Apple with Sam Maclennan and Vishwanath Lakkundi
Apache Spark At Apple with Sam Maclennan and Vishwanath LakkundiApache Spark At Apple with Sam Maclennan and Vishwanath Lakkundi
Apache Spark At Apple with Sam Maclennan and Vishwanath Lakkundi
 
Spark Advanced Analytics NJ Data Science Meetup - Princeton University
Spark Advanced Analytics NJ Data Science Meetup - Princeton UniversitySpark Advanced Analytics NJ Data Science Meetup - Princeton University
Spark Advanced Analytics NJ Data Science Meetup - Princeton University
 
Faster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on Hive
Faster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on HiveFaster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on Hive
Faster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on Hive
 
Stream All Things—Patterns of Modern Data Integration with Gwen Shapira
Stream All Things—Patterns of Modern Data Integration with Gwen ShapiraStream All Things—Patterns of Modern Data Integration with Gwen Shapira
Stream All Things—Patterns of Modern Data Integration with Gwen Shapira
 
Spark Uber Development Kit
Spark Uber Development KitSpark Uber Development Kit
Spark Uber Development Kit
 
Migrating from Redshift to Spark at Stitch Fix: Spark Summit East talk by Sky...
Migrating from Redshift to Spark at Stitch Fix: Spark Summit East talk by Sky...Migrating from Redshift to Spark at Stitch Fix: Spark Summit East talk by Sky...
Migrating from Redshift to Spark at Stitch Fix: Spark Summit East talk by Sky...
 
SparkOscope: Enabling Apache Spark Optimization through Cross Stack Monitorin...
SparkOscope: Enabling Apache Spark Optimization through Cross Stack Monitorin...SparkOscope: Enabling Apache Spark Optimization through Cross Stack Monitorin...
SparkOscope: Enabling Apache Spark Optimization through Cross Stack Monitorin...
 
Whirlpools in the Stream with Jayesh Lalwani
 Whirlpools in the Stream with Jayesh Lalwani Whirlpools in the Stream with Jayesh Lalwani
Whirlpools in the Stream with Jayesh Lalwani
 
Bringing complex event processing to Spark streaming
Bringing complex event processing to Spark streamingBringing complex event processing to Spark streaming
Bringing complex event processing to Spark streaming
 
Spark Summit EU talk by Ruben Pulido Behar Veliqi
Spark Summit EU talk by Ruben Pulido Behar VeliqiSpark Summit EU talk by Ruben Pulido Behar Veliqi
Spark Summit EU talk by Ruben Pulido Behar Veliqi
 
Migrating pipelines into Docker
Migrating pipelines into DockerMigrating pipelines into Docker
Migrating pipelines into Docker
 

Andere mochten auch

4.Building a Data Product using apache Zeppelin - Apache Kylin Meetup @Shanghai
4.Building a Data Product using apache Zeppelin - Apache Kylin Meetup @Shanghai4.Building a Data Product using apache Zeppelin - Apache Kylin Meetup @Shanghai
4.Building a Data Product using apache Zeppelin - Apache Kylin Meetup @ShanghaiLuke Han
 
Enabling Apache Zeppelin and Spark for Data Science in the Enterprise
Enabling Apache Zeppelin and Spark for Data Science in the EnterpriseEnabling Apache Zeppelin and Spark for Data Science in the Enterprise
Enabling Apache Zeppelin and Spark for Data Science in the EnterpriseDataWorks Summit/Hadoop Summit
 
Real time data viz with Spark Streaming, Kafka and D3.js
Real time data viz with Spark Streaming, Kafka and D3.jsReal time data viz with Spark Streaming, Kafka and D3.js
Real time data viz with Spark Streaming, Kafka and D3.jsBen Laird
 
Data Science lifecycle with Apache Zeppelin and Spark by Moonsoo Lee
Data Science lifecycle with Apache Zeppelin and Spark by Moonsoo LeeData Science lifecycle with Apache Zeppelin and Spark by Moonsoo Lee
Data Science lifecycle with Apache Zeppelin and Spark by Moonsoo LeeSpark Summit
 
Revolutionizing Radiology with Deep Learning: The Road to RSNA 2017
Revolutionizing Radiology with Deep Learning: The Road to RSNA 2017Revolutionizing Radiology with Deep Learning: The Road to RSNA 2017
Revolutionizing Radiology with Deep Learning: The Road to RSNA 2017NVIDIA
 
Top 5 Deep Learning and AI Stories - October 6, 2017
Top 5 Deep Learning and AI Stories - October 6, 2017Top 5 Deep Learning and AI Stories - October 6, 2017
Top 5 Deep Learning and AI Stories - October 6, 2017NVIDIA
 

Andere mochten auch (7)

4.Building a Data Product using apache Zeppelin - Apache Kylin Meetup @Shanghai
4.Building a Data Product using apache Zeppelin - Apache Kylin Meetup @Shanghai4.Building a Data Product using apache Zeppelin - Apache Kylin Meetup @Shanghai
4.Building a Data Product using apache Zeppelin - Apache Kylin Meetup @Shanghai
 
Enabling Apache Zeppelin and Spark for Data Science in the Enterprise
Enabling Apache Zeppelin and Spark for Data Science in the EnterpriseEnabling Apache Zeppelin and Spark for Data Science in the Enterprise
Enabling Apache Zeppelin and Spark for Data Science in the Enterprise
 
Real time data viz with Spark Streaming, Kafka and D3.js
Real time data viz with Spark Streaming, Kafka and D3.jsReal time data viz with Spark Streaming, Kafka and D3.js
Real time data viz with Spark Streaming, Kafka and D3.js
 
What's new in Hadoop Common and HDFS
What's new in Hadoop Common and HDFS What's new in Hadoop Common and HDFS
What's new in Hadoop Common and HDFS
 
Data Science lifecycle with Apache Zeppelin and Spark by Moonsoo Lee
Data Science lifecycle with Apache Zeppelin and Spark by Moonsoo LeeData Science lifecycle with Apache Zeppelin and Spark by Moonsoo Lee
Data Science lifecycle with Apache Zeppelin and Spark by Moonsoo Lee
 
Revolutionizing Radiology with Deep Learning: The Road to RSNA 2017
Revolutionizing Radiology with Deep Learning: The Road to RSNA 2017Revolutionizing Radiology with Deep Learning: The Road to RSNA 2017
Revolutionizing Radiology with Deep Learning: The Road to RSNA 2017
 
Top 5 Deep Learning and AI Stories - October 6, 2017
Top 5 Deep Learning and AI Stories - October 6, 2017Top 5 Deep Learning and AI Stories - October 6, 2017
Top 5 Deep Learning and AI Stories - October 6, 2017
 

Ähnlich wie Data science lifecycle with Apache Zeppelin

Spark Summit EU 2015: Lessons from 300+ production users
Spark Summit EU 2015: Lessons from 300+ production usersSpark Summit EU 2015: Lessons from 300+ production users
Spark Summit EU 2015: Lessons from 300+ production usersDatabricks
 
Quick Tour On Zeppelin
Quick Tour On ZeppelinQuick Tour On Zeppelin
Quick Tour On ZeppelinKnoldus Inc.
 
YARN Ready: Apache Spark
YARN Ready: Apache Spark YARN Ready: Apache Spark
YARN Ready: Apache Spark Hortonworks
 
Zeppelin meetup 2016 madrid
Zeppelin meetup 2016 madridZeppelin meetup 2016 madrid
Zeppelin meetup 2016 madridJongyoul Lee
 
Moon soo Lee – Data Science Lifecycle with Apache Flink and Apache Zeppelin
Moon soo Lee – Data Science Lifecycle with Apache Flink and Apache ZeppelinMoon soo Lee – Data Science Lifecycle with Apache Flink and Apache Zeppelin
Moon soo Lee – Data Science Lifecycle with Apache Flink and Apache ZeppelinFlink Forward
 
Running Apache Zeppelin production
Running Apache Zeppelin productionRunning Apache Zeppelin production
Running Apache Zeppelin productionVinay Shukla
 
Running Zeppelin in Enterprise
Running Zeppelin in EnterpriseRunning Zeppelin in Enterprise
Running Zeppelin in EnterpriseDataWorks Summit
 
Cassandra and SparkSQL: You Don't Need Functional Programming for Fun with Ru...
Cassandra and SparkSQL: You Don't Need Functional Programming for Fun with Ru...Cassandra and SparkSQL: You Don't Need Functional Programming for Fun with Ru...
Cassandra and SparkSQL: You Don't Need Functional Programming for Fun with Ru...Databricks
 
Apache Zeppelin and Helium @ApacheCon 2017 may, FL
Apache Zeppelin and Helium  @ApacheCon 2017 may, FLApache Zeppelin and Helium  @ApacheCon 2017 may, FL
Apache Zeppelin and Helium @ApacheCon 2017 may, FLAhyoung Ryu
 
Apache Spark vs Apache Flink
Apache Spark vs Apache FlinkApache Spark vs Apache Flink
Apache Spark vs Apache FlinkAKASH SIHAG
 
Big analytics meetup - Extended Jupyter Kernel Gateway
Big analytics meetup - Extended Jupyter Kernel GatewayBig analytics meetup - Extended Jupyter Kernel Gateway
Big analytics meetup - Extended Jupyter Kernel GatewayLuciano Resende
 
Collaborative data science and how to build a data science toolchain around n...
Collaborative data science and how to build a data science toolchain around n...Collaborative data science and how to build a data science toolchain around n...
Collaborative data science and how to build a data science toolchain around n...Moon Soo Lee
 
Apache spark with java 8
Apache spark with java 8Apache spark with java 8
Apache spark with java 8Janu Jahnavi
 
Apache spark with java 8
Apache spark with java 8Apache spark with java 8
Apache spark with java 8Janu Jahnavi
 
Is 12 Factor App Right About Logging
Is 12 Factor App Right About LoggingIs 12 Factor App Right About Logging
Is 12 Factor App Right About LoggingPhil Wilkins
 
The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
 The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a... The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...Big Data Spain
 
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...Simplilearn
 

Ähnlich wie Data science lifecycle with Apache Zeppelin (20)

Apache Zeppelin Helium and Beyond
Apache Zeppelin Helium and BeyondApache Zeppelin Helium and Beyond
Apache Zeppelin Helium and Beyond
 
Apache Zeppelin, Helium and Beyond
Apache Zeppelin, Helium and BeyondApache Zeppelin, Helium and Beyond
Apache Zeppelin, Helium and Beyond
 
Spark Summit EU 2015: Lessons from 300+ production users
Spark Summit EU 2015: Lessons from 300+ production usersSpark Summit EU 2015: Lessons from 300+ production users
Spark Summit EU 2015: Lessons from 300+ production users
 
Quick Tour On Zeppelin
Quick Tour On ZeppelinQuick Tour On Zeppelin
Quick Tour On Zeppelin
 
YARN Ready: Apache Spark
YARN Ready: Apache Spark YARN Ready: Apache Spark
YARN Ready: Apache Spark
 
Zeppelin meetup 2016 madrid
Zeppelin meetup 2016 madridZeppelin meetup 2016 madrid
Zeppelin meetup 2016 madrid
 
Moon soo Lee – Data Science Lifecycle with Apache Flink and Apache Zeppelin
Moon soo Lee – Data Science Lifecycle with Apache Flink and Apache ZeppelinMoon soo Lee – Data Science Lifecycle with Apache Flink and Apache Zeppelin
Moon soo Lee – Data Science Lifecycle with Apache Flink and Apache Zeppelin
 
Running Apache Zeppelin production
Running Apache Zeppelin productionRunning Apache Zeppelin production
Running Apache Zeppelin production
 
Running Zeppelin in Enterprise
Running Zeppelin in EnterpriseRunning Zeppelin in Enterprise
Running Zeppelin in Enterprise
 
Session 203 iouc summit database
Session 203 iouc summit databaseSession 203 iouc summit database
Session 203 iouc summit database
 
Cassandra and SparkSQL: You Don't Need Functional Programming for Fun with Ru...
Cassandra and SparkSQL: You Don't Need Functional Programming for Fun with Ru...Cassandra and SparkSQL: You Don't Need Functional Programming for Fun with Ru...
Cassandra and SparkSQL: You Don't Need Functional Programming for Fun with Ru...
 
Apache Zeppelin and Helium @ApacheCon 2017 may, FL
Apache Zeppelin and Helium  @ApacheCon 2017 may, FLApache Zeppelin and Helium  @ApacheCon 2017 may, FL
Apache Zeppelin and Helium @ApacheCon 2017 may, FL
 
Apache Spark vs Apache Flink
Apache Spark vs Apache FlinkApache Spark vs Apache Flink
Apache Spark vs Apache Flink
 
Big analytics meetup - Extended Jupyter Kernel Gateway
Big analytics meetup - Extended Jupyter Kernel GatewayBig analytics meetup - Extended Jupyter Kernel Gateway
Big analytics meetup - Extended Jupyter Kernel Gateway
 
Collaborative data science and how to build a data science toolchain around n...
Collaborative data science and how to build a data science toolchain around n...Collaborative data science and how to build a data science toolchain around n...
Collaborative data science and how to build a data science toolchain around n...
 
Apache spark with java 8
Apache spark with java 8Apache spark with java 8
Apache spark with java 8
 
Apache spark with java 8
Apache spark with java 8Apache spark with java 8
Apache spark with java 8
 
Is 12 Factor App Right About Logging
Is 12 Factor App Right About LoggingIs 12 Factor App Right About Logging
Is 12 Factor App Right About Logging
 
The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
 The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a... The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
 
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
 

Mehr von DataWorks Summit/Hadoop Summit

Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerDataWorks Summit/Hadoop Summit
 
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformEnabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformDataWorks Summit/Hadoop Summit
 
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDouble Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDataWorks Summit/Hadoop Summit
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...DataWorks Summit/Hadoop Summit
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...DataWorks Summit/Hadoop Summit
 
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLMool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLDataWorks Summit/Hadoop Summit
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)DataWorks Summit/Hadoop Summit
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...DataWorks Summit/Hadoop Summit
 

Mehr von DataWorks Summit/Hadoop Summit (20)

Running Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in ProductionRunning Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in Production
 
State of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache ZeppelinState of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache Zeppelin
 
Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache Ranger
 
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformEnabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science Platform
 
Revolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and ZeppelinRevolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and Zeppelin
 
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDouble Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSense
 
Hadoop Crash Course
Hadoop Crash CourseHadoop Crash Course
Hadoop Crash Course
 
Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Apache Spark Crash Course
Apache Spark Crash CourseApache Spark Crash Course
Apache Spark Crash Course
 
Dataflow with Apache NiFi
Dataflow with Apache NiFiDataflow with Apache NiFi
Dataflow with Apache NiFi
 
Schema Registry - Set you Data Free
Schema Registry - Set you Data FreeSchema Registry - Set you Data Free
Schema Registry - Set you Data Free
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
 
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLMool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and ML
 
How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient
 
HBase in Practice
HBase in Practice HBase in Practice
HBase in Practice
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)
 
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS HadoopBreaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
 
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
 

Último

Webinar: The Art of Prioritizing Your Product Roadmap by AWS Sr PM - Tech
Webinar: The Art of Prioritizing Your Product Roadmap by AWS Sr PM - TechWebinar: The Art of Prioritizing Your Product Roadmap by AWS Sr PM - Tech
Webinar: The Art of Prioritizing Your Product Roadmap by AWS Sr PM - TechProduct School
 
SIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENT
SIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENTSIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENT
SIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENTxtailishbaloch
 
LF Energy Webinar - Unveiling OpenEEMeter 4.0
LF Energy Webinar - Unveiling OpenEEMeter 4.0LF Energy Webinar - Unveiling OpenEEMeter 4.0
LF Energy Webinar - Unveiling OpenEEMeter 4.0DanBrown980551
 
Outage Analysis: March 5th/6th 2024 Meta, Comcast, and LinkedIn
Outage Analysis: March 5th/6th 2024 Meta, Comcast, and LinkedInOutage Analysis: March 5th/6th 2024 Meta, Comcast, and LinkedIn
Outage Analysis: March 5th/6th 2024 Meta, Comcast, and LinkedInThousandEyes
 
From the origin to the future of Open Source model and business
From the origin to the future of  Open Source model and businessFrom the origin to the future of  Open Source model and business
From the origin to the future of Open Source model and businessFrancesco Corti
 
Introduction - IPLOOK NETWORKS CO., LTD.
Introduction - IPLOOK NETWORKS CO., LTD.Introduction - IPLOOK NETWORKS CO., LTD.
Introduction - IPLOOK NETWORKS CO., LTD.IPLOOK Networks
 
UiPath Studio Web workshop series - Day 1
UiPath Studio Web workshop series  - Day 1UiPath Studio Web workshop series  - Day 1
UiPath Studio Web workshop series - Day 1DianaGray10
 
Emil Eifrem at GraphSummit Copenhagen 2024 - The Art of the Possible.pptx
Emil Eifrem at GraphSummit Copenhagen 2024 - The Art of the Possible.pptxEmil Eifrem at GraphSummit Copenhagen 2024 - The Art of the Possible.pptx
Emil Eifrem at GraphSummit Copenhagen 2024 - The Art of the Possible.pptxNeo4j
 
Patch notes explaining DISARM Version 1.4 update
Patch notes explaining DISARM Version 1.4 updatePatch notes explaining DISARM Version 1.4 update
Patch notes explaining DISARM Version 1.4 updateadam112203
 
Scenario Library et REX Discover industry- and role- based scenarios
Scenario Library et REX Discover industry- and role- based scenariosScenario Library et REX Discover industry- and role- based scenarios
Scenario Library et REX Discover industry- and role- based scenariosErol GIRAUDY
 
UiPath Studio Web workshop Series - Day 3
UiPath Studio Web workshop Series - Day 3UiPath Studio Web workshop Series - Day 3
UiPath Studio Web workshop Series - Day 3DianaGray10
 
3 Pitfalls Everyone Should Avoid with Cloud Data
3 Pitfalls Everyone Should Avoid with Cloud Data3 Pitfalls Everyone Should Avoid with Cloud Data
3 Pitfalls Everyone Should Avoid with Cloud DataEric D. Schabell
 
Flow Control | Block Size | ST Min | First Frame
Flow Control | Block Size | ST Min | First FrameFlow Control | Block Size | ST Min | First Frame
Flow Control | Block Size | ST Min | First FrameKapil Thakar
 
Technical SEO for Improved Accessibility WTS FEST
Technical SEO for Improved Accessibility  WTS FESTTechnical SEO for Improved Accessibility  WTS FEST
Technical SEO for Improved Accessibility WTS FESTBillieHyde
 
Keep Your Finger on the Pulse of Your Building's Performance with IES Live
Keep Your Finger on the Pulse of Your Building's Performance with IES LiveKeep Your Finger on the Pulse of Your Building's Performance with IES Live
Keep Your Finger on the Pulse of Your Building's Performance with IES LiveIES VE
 
Trailblazer Community - Flows Workshop (Session 2)
Trailblazer Community - Flows Workshop (Session 2)Trailblazer Community - Flows Workshop (Session 2)
Trailblazer Community - Flows Workshop (Session 2)Muhammad Tiham Siddiqui
 
UiPath Studio Web workshop series - Day 4
UiPath Studio Web workshop series - Day 4UiPath Studio Web workshop series - Day 4
UiPath Studio Web workshop series - Day 4DianaGray10
 
Extra-120324-Visite-Entreprise-icare.pdf
Extra-120324-Visite-Entreprise-icare.pdfExtra-120324-Visite-Entreprise-icare.pdf
Extra-120324-Visite-Entreprise-icare.pdfInfopole1
 
The New Cloud World Order Is FinOps (Slideshow)
The New Cloud World Order Is FinOps (Slideshow)The New Cloud World Order Is FinOps (Slideshow)
The New Cloud World Order Is FinOps (Slideshow)codyslingerland1
 

Último (20)

Webinar: The Art of Prioritizing Your Product Roadmap by AWS Sr PM - Tech
Webinar: The Art of Prioritizing Your Product Roadmap by AWS Sr PM - TechWebinar: The Art of Prioritizing Your Product Roadmap by AWS Sr PM - Tech
Webinar: The Art of Prioritizing Your Product Roadmap by AWS Sr PM - Tech
 
SIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENT
SIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENTSIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENT
SIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENT
 
LF Energy Webinar - Unveiling OpenEEMeter 4.0
LF Energy Webinar - Unveiling OpenEEMeter 4.0LF Energy Webinar - Unveiling OpenEEMeter 4.0
LF Energy Webinar - Unveiling OpenEEMeter 4.0
 
Outage Analysis: March 5th/6th 2024 Meta, Comcast, and LinkedIn
Outage Analysis: March 5th/6th 2024 Meta, Comcast, and LinkedInOutage Analysis: March 5th/6th 2024 Meta, Comcast, and LinkedIn
Outage Analysis: March 5th/6th 2024 Meta, Comcast, and LinkedIn
 
From the origin to the future of Open Source model and business
From the origin to the future of  Open Source model and businessFrom the origin to the future of  Open Source model and business
From the origin to the future of Open Source model and business
 
Introduction - IPLOOK NETWORKS CO., LTD.
Introduction - IPLOOK NETWORKS CO., LTD.Introduction - IPLOOK NETWORKS CO., LTD.
Introduction - IPLOOK NETWORKS CO., LTD.
 
UiPath Studio Web workshop series - Day 1
UiPath Studio Web workshop series  - Day 1UiPath Studio Web workshop series  - Day 1
UiPath Studio Web workshop series - Day 1
 
Emil Eifrem at GraphSummit Copenhagen 2024 - The Art of the Possible.pptx
Emil Eifrem at GraphSummit Copenhagen 2024 - The Art of the Possible.pptxEmil Eifrem at GraphSummit Copenhagen 2024 - The Art of the Possible.pptx
Emil Eifrem at GraphSummit Copenhagen 2024 - The Art of the Possible.pptx
 
Patch notes explaining DISARM Version 1.4 update
Patch notes explaining DISARM Version 1.4 updatePatch notes explaining DISARM Version 1.4 update
Patch notes explaining DISARM Version 1.4 update
 
Scenario Library et REX Discover industry- and role- based scenarios
Scenario Library et REX Discover industry- and role- based scenariosScenario Library et REX Discover industry- and role- based scenarios
Scenario Library et REX Discover industry- and role- based scenarios
 
UiPath Studio Web workshop Series - Day 3
UiPath Studio Web workshop Series - Day 3UiPath Studio Web workshop Series - Day 3
UiPath Studio Web workshop Series - Day 3
 
3 Pitfalls Everyone Should Avoid with Cloud Data
3 Pitfalls Everyone Should Avoid with Cloud Data3 Pitfalls Everyone Should Avoid with Cloud Data
3 Pitfalls Everyone Should Avoid with Cloud Data
 
Flow Control | Block Size | ST Min | First Frame
Flow Control | Block Size | ST Min | First FrameFlow Control | Block Size | ST Min | First Frame
Flow Control | Block Size | ST Min | First Frame
 
Technical SEO for Improved Accessibility WTS FEST
Technical SEO for Improved Accessibility  WTS FESTTechnical SEO for Improved Accessibility  WTS FEST
Technical SEO for Improved Accessibility WTS FEST
 
Keep Your Finger on the Pulse of Your Building's Performance with IES Live
Keep Your Finger on the Pulse of Your Building's Performance with IES LiveKeep Your Finger on the Pulse of Your Building's Performance with IES Live
Keep Your Finger on the Pulse of Your Building's Performance with IES Live
 
SheDev 2024
SheDev 2024SheDev 2024
SheDev 2024
 
Trailblazer Community - Flows Workshop (Session 2)
Trailblazer Community - Flows Workshop (Session 2)Trailblazer Community - Flows Workshop (Session 2)
Trailblazer Community - Flows Workshop (Session 2)
 
UiPath Studio Web workshop series - Day 4
UiPath Studio Web workshop series - Day 4UiPath Studio Web workshop series - Day 4
UiPath Studio Web workshop series - Day 4
 
Extra-120324-Visite-Entreprise-icare.pdf
Extra-120324-Visite-Entreprise-icare.pdfExtra-120324-Visite-Entreprise-icare.pdf
Extra-120324-Visite-Entreprise-icare.pdf
 
The New Cloud World Order Is FinOps (Slideshow)
The New Cloud World Order Is FinOps (Slideshow)The New Cloud World Order Is FinOps (Slideshow)
The New Cloud World Order Is FinOps (Slideshow)
 

Data science lifecycle with Apache Zeppelin

  • 1. Data Science lifecycle with Apache Zeppelin http://zeppelin.apache.org
  • 2. Moon Creator of Apache Zeppelin Co-founder NFLabs
  • 3. Zeppelin 2012. 12 Data analytics solution based on AMP Lab Spark/Shark
  • 4. Zeppelin 2012. 12 Data analytics solution based on AMP Lab Spark/Shark 2013. 10 Opensource interactive analytics feature as ‘Zeppelin’ 2013. 10 2014. 08
  • 5. Zeppelin 2012. 12 Data analytics solution based on AMP Lab Spark/Shark 2013. 10 Opensource interactive analytics feature as ‘Zeppelin’ 2014. 12 ASF incubation Incubation Status http://incubator.apache.org/projects/zeppelin.html
  • 6. Zeppelin 2012. 12 Data analytics solution based on AMP Lab Spark/Shark 2013. 10 Opensource interactive analytics feature as ‘Zeppelin’ 2014. 12 ASF incubation 2016. 10 157 Contributors world wide 2071 Stars on github repo 6 Releases One of the most popular project in ASF
  • 7. Collect ETL / Process Analysis Report Data Product Life cycle of big data Data Engineer Data Scientist Business user Customer
  • 8. Zeppelin A web-based notebook that enables interactive data analytics. You can make beautiful data-driven, interactive and collaborative documents with SQL, Scala and more.
  • 9. Zeppelin JDBC Markdown > _ Shell Interpreter : pluggable layer for language / processing backend integration 20+ interpreters are supported officially 2016. 03. Interpreters in Zeppelin source tree. Does not include 3rd party interpreters
  • 10. Zeppelin Interpreter : pluggable layer for language / processing backend integration
  • 11. Zeppelin Interpreter : Easy to extend public abstract class Interpreter { public void open(); public void close(); public InterpreterResult interpret(String st, InterpreterContext context); public void cancel(InterpreterContext context); public int getProgress(InterpreterContext context); public List<String> completion(String buf, int cursor); public FormType getFormType(); public Scheduler getScheduler(); } {Must have {Good to have Advanced {
  • 12. Zeppelin Notebook Repo : pluggable layer for notebook persistence 5+ Notebook repos are supported officially 2016. 03. Notebook repos in Zeppelin source tree. Does not include 3rd party interpreters ZeppelinHub
  • 13. Zeppelin Notebook Repo : Easy to extend public interface NotebookRepo { public List<NoteInfo> list() throws IOException; public Note get(String noteId) throws IOException; public void save(Note note) throws IOException; public void remove(String noteId) throws IOException; public void checkpoint(String noteId, String checkPointName) throws IOException; public void close(); }
  • 14. Zeppelin Visualizations : 6 Built-in visualizations comes with pivot Table Bar Pie Area Line Scatter Free to draw any customized visualizations inside of notebook …
  • 15. He liumHe 2 Platform for data analytics application that makes visualization pluggable and more. http://issues.apache.org/jira/browse/ZEPPELIN-533 https://cwiki.apache.org/confluence/display/ZEPPELIN/Helium+proposal Proposal Umbrella issue Makes Zeppelin fly!
  • 16. He liumHe 2 RESTful API Websocket Interpreter Notebook Storage Spark Flink Geode JDBC … FileSystem AmazonS3 Git … ZeppelinServer Interpreters and Notebook storage are pluggable
  • 17. He liumHe 2 Interpreter Notebook Storage Spark Flink Geode JDBC … FileSystem AmazonS3 Git … ZeppelinServer Visualizations Map WordCloud … We want visualization be pluggable
  • 18. He liumHe 2 Interpreter Notebook Storage Spark Flink Geode JDBC … FileSystem AmazonS3 Git … Application Visualizations Map WordCloud … Resource Pool SparkContext Flink Environment JDBC connection … Analytics … … User object Extend pluggable visualization to pluggable analytics application
  • 19. Helium Application: Easy to extend public abstract class Application { public Application(ApplicationContext context); public abstract void run(ResourceSet args); public abstract void unload(); } He liumHe 2
  • 20. Launcher: Suggest application according to data type in ResourcePool He liumHe 2
  • 22. Jongyoul Lee PMC of Apache Zeppelin Software Development Engineer at NFLabs
  • 23. & Enterprise More than 1000 employers use Apache Zeppelin Supports Apache Zeppelin as an internal service Recommendation team uses Apache Zeppelin Monitors their infrastructures via Apache Zeppelin
  • 25. & Enterprise History ~ 0.6 • NOTHING!!! 0.6.x • Authentication & Authorization • Note level permission • Note level isolation • Partially supported by Livy
  • 26. & Enterprise Future 0.7.0 • Enterprise Support • Multi users environment • Impersonation on Spark/JDBC interpreter • Job management • Interpreter • Improvement on JDBC/Python interpreter • Frontend performance improvement • Pluggable visualization
  • 27. & Enterprise Future 0.7.0 • Enterprise Support • Multi users environment • Impersonation on Spark/JDBC interpreter • Job management • Interpreter • Improvement on JDBC/Python interpreter • Frontend performance improvement • Pluggable visualization
  • 28. & Enterprise RESTful API Websocket Interpreter Notebook Storage Spark Flink Geode JDBC … FileSystem AmazonS3 Git … ZeppelinServer Multi-tenancy
  • 29. & Enterprise RESTful API Websocket Interpreter Notebook Storage Spark Flink Geode JDBC … FileSystem AmazonS3 Git … ZeppelinServer NO USER Multi-tenancy
  • 31. & Enterprise ZeppelinServer SparkInterpreter Run P1 on NoteA Run SparkInterpreter for P1 User1 Multi-tenancy
  • 32. & Enterprise ZeppelinServer SparkInterpreter Run P1 on NoteA Run SparkInterpreter for P1 User1 User2 Run P2 on NoteB Run SparkInterpreter for P2 Multi-tenancy
  • 33. & Enterprise • Originally implemented • Pros • Simple structure • Predictable behavior • Cons • All resources shared • Interference among users Multi-tenancy Shared
  • 34. & Enterprise ZeppelinServer SparkInterpreter Run P1 on NoteA Run SparkInterpreter for P1 User1 User2 Run P2 on NoteB Run SparkInterpreter for P2 SparkInterpreter Multi-tenancy
  • 35. & Enterprise • Pros • No pending • No resources shared • Cons • Lots of memory • Inefficiency of using memory • Limited by resources Multi-tenancy Isolated
  • 36. & Enterprise ZeppelinServer SparkInterpreter Run P1 on NoteA Run SparkInterpreter for P1 User1 User2 Run P2 on NoteB Run SparkInterpreter for P2 SparkInterpreter Multi-tenancy
  • 37. & Enterprise ZeppelinServer JDBCInterpreter Run P2 on NoteA Run SparkInterpreter for P2 User1 User2 Run P3 on NoteB Run SparkInterpreter for P3 JDBCInterpreter Multi-tenancy
  • 38. & Enterprise ZeppelinServer JDBCInterpreter Run P2 on NoteA Run SparkInterpreter for P2 User1 User2 Run P3 on NoteB Run SparkInterpreter for P3 Multi-tenancy JDBCInstance User1 JDBCInstance User2
  • 39. & Enterprise • Pros • Less memory • Some resources Isolated • Cons • Some resources shared • Big single process Multi-tenancy Scoped
  • 40. & Enterprise Future 0.7.0 • Enterprise Support • Multi users environment • Impersonation on Spark/JDBC interpreter • Job management • Interpreter • Improvement on JDBC/Python interpreter • Frontend performance improvement • Pluggable visualization
  • 42. What if all users use different credentials?
  • 44. & Enterprise Impersonation Credentials • Already merged by Twitter at Mar. 2016 • Never used in any interpreter
  • 46. • JDBC • Set user and password in properties • https://issues.apache.org/jira/browse/ZEPPELIN-1567 • Spark • Adopt ugi.doAs() • https://issues.apache.org/jira/browse/ZEPPELIN-1572 & Enterprise Impersonation
  • 47. & Enterprise Future 0.7.0 • Enterprise Support • Multi users environment • Impersonation on Spark/JDBC interpreter • Job management • Interpreter • Improvement on JDBC/Python interpreter • Frontend performance improvement • Pluggable visualization
  • 49. & Enterprise Future 0.7.0 • Enterprise Support • Multi users environment • Impersonation on Spark/JDBC interpreter • Job management • Interpreter • Improvement on JDBC/Python interpreter • Frontend performance improvement • Pluggable visualization
  • 50. • JDBC • Connection pool • Stabilization for BI • Python • Matplot library • Support on python user & Enterprise Interpreters
  • 51. & Enterprise Future 0.7.0 • Enterprise Support • Multi users environment • Impersonation on Spark/JDBC interpreter • Job management • Interpreter • Improvement on JDBC/Python interpreter • Frontend performance improvement • Pluggable visualization
  • 52. • Frontend • Fine-grained broadcast of WebSocket • Betterment of rendering DOM • Pluggable visualization • lium & Enterprise Frontend He 2
  • 54. Thank you Moon soo Lee moon@nflabs.com https://twitter.com/issuefreaks Jongyoul Lee jongyoul@nflabs.com https://twitter.com/madeng