SlideShare ist ein Scribd-Unternehmen logo
1 von 63
Downloaden Sie, um offline zu lesen
LEVERAGING DOCKER FOR
HADOOP BUILD
AUTOMATION
AND 

BIG DATA STACK
PROVISIONING
Evans Ye, Sr. Software Engineer
DataWorks Summit San Jose 2017
Who am I
• Tech Lead @ APAC Data Team, Y! Taiwan
• Building data products for E-Commerce business
• PMC chair of Apache Bigtop, ASF member
2
Outline
• Quick Intro to Apache Bigtop
• Docker for Bigtop Packaging
• Docker for Bigtop Provisioner
• Docker for Bigtop Sandbox
• Releases
3
QUICK INTRO TO 

APACHE BIGTOP
4
Linux Distributions
5
Hadoop Distributions
6
But there're some other great
Hadoop ecosystem components..
7
How do I add patches?
8
9
From source code to packages
Bigtop

Packaging
10
Supported components
11
Bigtop feature set
Packaging Testing Deployment Virtualization
for you to easily build your own Big Data Stack
12
Community stats
• 94 total contributors
• Spark: 1093, Hadoop: 99, HBase: 126, Hive:115
• 5 years since 2012
• 30 Hadoop ecosystem components packaged
• 5 Linux Distro., 2 archs supported
13
DOCKER FOR 

BIGTOP PACKAGING
14
Preparing build environment
15
Preparing build environment
…

Seriously ?
16
Bigtop Toolchain
• Puppet recipes to install required libraries, build tools
• To prepare a build environment:
• Prerequisite :
▪ Java
git clone https://github.com/apache/bigtop.git
cd bigtop
./bigtop_toolchain/bin/puppetize.sh
./gradlew toolchain
17
CI infrastructure
CentOS slave
Fedora slave
Ubuntu slave
Debian slave
OpenSuSE slave
18
CI infrastructure
CentOS slave
Fedora slave
Ubuntu slave
Debian slave
OpenSuSE slave
Bigtop Toolchain
Bigtop Toolchain
Bigtop Toolchain
Bigtop Toolchain
Bigtop Toolchain
19
CI infrastructure
CentOS slave
Fedora slave
Ubuntu slave
Debian slave
OpenSuSE slave
Bigtop Toolchain
Bigtop Toolchain
Bigtop Toolchain
Bigtop Toolchain
Bigtop Toolchain
20
Dockerlized CI infrastructure
CentOS slave
Fedora slave
Ubuntu slave
Debian slave
OpenSuSE slave
• Immutable env
• Fault tolerance
21
Dockerlized CI infrastructure
CentOS slave
Fedora slave
Ubuntu slave
Debian slave
OpenSuSE slave
• Immutable env
• Fault tolerance
22
• Execute shell
• Bigtop CI Setup Guide
How to build packages
# OS=debian-8
# COMPONENT=hadoop
docker run -u jenkins --rm 
-v `pwd`:/bigtop --workdir /bigtop 
bigtop/slaves:trunk-$OS 
bash -l -c "./gradlew allclean $COMPONENT-pkg"
23
Bigtop packages on master
https://ci.bigtop.apache.org/view/Packages/job/Bigtop-trunk-packages/
24
• Example: How to port Bigtop Distribution to PPC64LE?
• Prepare PPC64LE docker base image
• Apply Bigtop Toolchain on PPC64LE docker image
• Build Bigtop packages on PPC64LE slaves image
• 2016: Ported 22 out of 24 Bigtop components in 2 weeks, with only 5 patches
• Credit: Amir Sanjar, IBM
Extremely friendly for porting
25
Bigtop early mission accomplished
Leveraged by app providers…
26
Get out from the Apache dome
27
New focus and target user
• Data engineers vs Distro. builders
• Solution diversity:
▪ Streaming: Flink, Apex
▪ In-memory cache: Alluxio, Ignite
▪ User/developer tools:
▪ Bigtop Provisioner
▪ Bigtop Sandbox
• Big data stack references
• Machine learning, deep learning components
28
DOCKER FOR 

BIGTOP PROVISIONER
29
Bigtop Provisioner
• A tool to demonstrate full life cycle of Bigtop
Packaging TestingDeploymentVirtualization
Create resources Run Bigtop Puppet Run Bigtop Tests
Bigtop Provisioner
30
• We use Vagrant as an abstraction layer to support
different kind of resource providers
Vagrant
Providers
One click Hadoop provisioning

(Bigtop 1.0.0)
bigtop/deploy image 

on Docker hub
./docker-hadoop.sh -c 3
puppet apply
puppet apply
puppet apply
32
https://cwiki.apache.org/confluence/display/BIGTOP/Bigtop+Provisioner+User+Guide
Problems with Vagrant’s Docker Provider
• Need to add vagrant public key into docker images
• Too many issues with auto-created boot2docker VM
• A bug for docker provider regarding provision keeps opening for 2 years
▪ Waiting for machine to boot' hangs infinitely
• Can not share same code for different providers anyway
• Not all the docker options supported in Vagrantfile
• ^#?& slow
33
Replaced by docker-compose 

(Bigtop 1.2.0)
./docker-hadoop.sh -c 3
puppet apply
puppet apply
puppet apply
34
bigtop/deploy image 

on Docker hub
Advantages
• No need to create customized image beforehand
• Better compatibility with Docker’s native solutions
• Clear, simple yaml file for orchestration settings
• Supports new features such as overlay network
• Leverage Swarm for multi-node cluster deployment
• Fast —> better user experience
35
• Execute shell
• Bigtop CI Setup Guide
How to run Docker Provisioner
# See bigtop/provisioner/docker/*.yaml
CONFIG=YOUR_CUSTOM_CONF.yaml
# provision
./gradlew -Pconfig=${CONFIG} -Pnum_instances=1 
docker-provisioner
# destroy provisioned cluster
./gradlew docker-provisioner-destroy
36
YOUR_CUSTOM_CONF.yaml example
37
docker:
memory_limit: "4g"
image: "bigtop/puppet:centos-7"
repo: "http://bigtop-repos.s3.amazonaws.com/releases/1.2.0/
centos/7/x86_64"
distro: centos
components: [hdfs, yarn, mapreduce]
enable_local_repo: false
smoke_test_components: [hdfs, yarn, mapreduce]
38
Visibility for deployments
38
Use cases
• For application developers, cluster admins, users
▪ Run a Hadoop cluster to test your code on
▪ Try & test configurations before applying to Production
▪ Play around with Bigtop Big Data Stacks
• For contributors
▪ Easy to test your packaging, deployment, testing code
• For Distro. builders
▪ CI matrix —> patch upstream code made easier
39
DOCKER FOR
BIGTOP SANDBOX
40
Introducing Bigtop Sandbox
• Easy way to get started
• Docker images that has Bigtop stacks installed and
configured
• Pseudo cluster up & running w/o installation
• Command-line tool for you to build your own stack
41
Docker image layer
Interface
Customized	big	data	stack
Deployment	&	management	tool
Base	image	(OS)
42
Docker image layer
Concrete implementation
HDFS	+	YARN	+	Spark
Bigtop	Puppet
bigtop/puppet:ubuntu-16.04
43
Building images
Ubuntu	16.04
Bigtop	Puppet
HDFS	+	YARN	+	Spark
+ site.yaml
$ puppet apply
44
site.yaml example
45
bigtop::hadoop_head_node: bigtop.example.com
bigtop::bigtop_repo_uri: http://bigtop-repos.s3.amazonaws.com/
releases/1.2.0/debian/8/x86_64
hadoop::hadoop_storage_dirs: [/data/1, /data/2]
hadoop_cluster_node::cluster_components: [hdfs, yarn, spark]
How to build
• Or specify your custom conf:
git clone https://github.com/apache/bigtop.git
cd bigtop/docker/sandbox
./build.sh -a bigtop -o ubuntu-16.04 
-c "hdfs, yarn, spark"
./build.sh-a bigtop -o ubuntu-16.04 
-f custom_site.yaml -t dws2017
46
Running images
HDFS	+	YARN	+	Spark
$ puppet apply
47
How to run
docker run --name sandbox -d 
-p 50070:50070 -p 8088:8088 
evansye/sandbox:dws2017
docker logs -f sandbox
docker exec sandbox spark-example SparkPi
48
49
Bigtop Provisioner Bigtop Sandbox
Scalable V X
Portable X V
Flexibility High Medium
Speed > 2 mins > 15 secs
Requires Network V X
Port forwarding X V
50
Bigtop Provisioner Bigtop Sandbox
Data engineers
Multi-node 

cluster testing
Build/use
sandboxes 

for dev & test
Ops
Multi-node 

cluster testing
Single node 

testing
Contributors
Test packages,
puppet recipes,

test cases
Test packages,
puppet recipes,

test cases
Distro. Builders
Test packages,
puppet recipes,

test cases
Provide Sandboxes
51
Integration test in CI/CD pipeline
Unit	
Test	
Source	
code		
Compile	
	
Build	
Image	
Integra7on	test	with	
Sandbox	
Sandbox	Service	
CD	pipeline	with	Bigtop	Sandbox	
Docker	Registry	
Push	
Image	
Deploy	
	
FINISHED	
	
Data	
52
Future
• Production deployment using Sandbox images
▪ --net host or overlay network(SDN)?
▪ External volumes for edit logs, fsimages, etc
▪ Cluster orchestration
▪ Swarm, Kubernetes?
53
RELEASES
54
▪ New components:
▪ Ambari 2.5.0
▪ GPDB 5.0.0-alpha.0

(Greenplum)
Bigtop 1.2.0 Released April, 2017
▪ Featured upgrade:
▪ Hadoop 2.7.3
▪ Spark 2.1.0
▪ Kafka 0.10.1.1
▪ HBase 1.1.3
▪ and more
55
• New features:
▪ Juju bigtop charms
▪ Bigtop Sandbox (alpha, recommended to try master)
• Improvement:
▪ Bigtop Docker Provisioner made faster
New features in Bigtop 1.2.0
56
Juju Cloud Weather Report
http://bigtop.charm.qa/
57
• Expected to be out late June
• Hadoop 2.7.4 

(Interested in docker container support back ported, but I'm not sure yet)
• Mainly bug fixes:
• Packages
• Deployments
• Sandbox
Bigtop 1.2.1 up coming
58
• Machine Learning and Deep Learning integration
• Support aarch 64
• Enhance support set in Bigtop Puppet (not all components covered)
• Extend the CI matrix coverage to Bigtop Tests
• Ambari Bigtop stack integration
• Provide Big data stack references
Road ahead towards 1.3.0
59
60
• Submit your proposal, contribute Bigtop w/ funding!
• Improvements, new features, build, test, CI, etc
• CFP opened June 13, 2017

CFP closed July 14, 2017
• https://www.odpi.org/community/bigtopgrantfund
ODPi Apache Bigtop Test Drive Program
61
• Join mailing list, ask questions, suggest features, etc
• Contribute (components, tutorials, docs)
• Report bugs
▪ Home page: http://bigtop.apache.org/
▪ mailing list: http://bigtop.apache.org/mail-lists.html
▪ Document: https://cwiki.apache.org/confluence/display/BIGTOP/Index
▪ Source code: https://github.com/apache/bigtop
▪ Packages: https://www.apache.org/dist/bigtop/bigtop-1.2.0/repos/
▪ JIRA: https://issues.apache.org/jira/browse/BIGTOP
Reference
62
63
Questions?

Weitere ähnliche Inhalte

Was ist angesagt?

[OpenStack Day in Korea] Keynote#2 - Bringing OpenStack to the Enterprise Dat...
[OpenStack Day in Korea] Keynote#2 - Bringing OpenStack to the Enterprise Dat...[OpenStack Day in Korea] Keynote#2 - Bringing OpenStack to the Enterprise Dat...
[OpenStack Day in Korea] Keynote#2 - Bringing OpenStack to the Enterprise Dat...Sungjin Kang
 
5 ways to install @OpenShift in 5 minutes (Lightening Talk given at #DevConfC...
5 ways to install @OpenShift in 5 minutes (Lightening Talk given at #DevConfC...5 ways to install @OpenShift in 5 minutes (Lightening Talk given at #DevConfC...
5 ways to install @OpenShift in 5 minutes (Lightening Talk given at #DevConfC...OpenShift Origin
 
Apache Deep Learning 101 - ApacheCon Montreal 2018 v0.31
Apache Deep Learning 101 - ApacheCon Montreal 2018 v0.31Apache Deep Learning 101 - ApacheCon Montreal 2018 v0.31
Apache Deep Learning 101 - ApacheCon Montreal 2018 v0.31Timothy Spann
 
[OpenStack Day in Korea 2015] Track 1 - Triple O를 이용한 빠르고 쉬운 OpenStack 설치
[OpenStack Day in Korea 2015] Track 1 - Triple O를 이용한 빠르고 쉬운 OpenStack 설치[OpenStack Day in Korea 2015] Track 1 - Triple O를 이용한 빠르고 쉬운 OpenStack 설치
[OpenStack Day in Korea 2015] Track 1 - Triple O를 이용한 빠르고 쉬운 OpenStack 설치OpenStack Korea Community
 
Red Hat presentatie: Open stack Latest Pure Tech
Red Hat presentatie: Open stack Latest Pure TechRed Hat presentatie: Open stack Latest Pure Tech
Red Hat presentatie: Open stack Latest Pure TechProxyServices
 
Open cloud infrastructure built for the enterprise
Open cloud infrastructure built for the enterpriseOpen cloud infrastructure built for the enterprise
Open cloud infrastructure built for the enterpriseRedHatInc
 
Running Emerging AI Applications on Big Data Platforms with Ray On Apache Spark
Running Emerging AI Applications on Big Data Platforms with Ray On Apache SparkRunning Emerging AI Applications on Big Data Platforms with Ray On Apache Spark
Running Emerging AI Applications on Big Data Platforms with Ray On Apache SparkDatabricks
 
Scale up and Scale Out Anaconda and PyData
Scale up and Scale Out Anaconda and PyDataScale up and Scale Out Anaconda and PyData
Scale up and Scale Out Anaconda and PyDataTravis Oliphant
 
Extending TripleO for OpenStack Management
Extending TripleO for OpenStack ManagementExtending TripleO for OpenStack Management
Extending TripleO for OpenStack ManagementKeith Basil
 
Deep Learning to Big Data Analytics on Apache Spark Using BigDL with Xianyan ...
Deep Learning to Big Data Analytics on Apache Spark Using BigDL with Xianyan ...Deep Learning to Big Data Analytics on Apache Spark Using BigDL with Xianyan ...
Deep Learning to Big Data Analytics on Apache Spark Using BigDL with Xianyan ...Databricks
 
What HPC can learn from DevOps?
What HPC can learn from DevOps?What HPC can learn from DevOps?
What HPC can learn from DevOps?Walid Shaari
 
Scaling PyData Up and Out
Scaling PyData Up and OutScaling PyData Up and Out
Scaling PyData Up and OutTravis Oliphant
 
HPNFVの取組みとMWC2015 – OpenStack最新情報セミナー 2015年4月
HPNFVの取組みとMWC2015 – OpenStack最新情報セミナー 2015年4月HPNFVの取組みとMWC2015 – OpenStack最新情報セミナー 2015年4月
HPNFVの取組みとMWC2015 – OpenStack最新情報セミナー 2015年4月VirtualTech Japan Inc.
 
Bringing an AI Ecosystem to the Domain Expert and Enterprise AI Developer wit...
Bringing an AI Ecosystem to the Domain Expert and Enterprise AI Developer wit...Bringing an AI Ecosystem to the Domain Expert and Enterprise AI Developer wit...
Bringing an AI Ecosystem to the Domain Expert and Enterprise AI Developer wit...Databricks
 
BigDL: Bringing Ease of Use of Deep Learning for Apache Spark with Jason Dai ...
BigDL: Bringing Ease of Use of Deep Learning for Apache Spark with Jason Dai ...BigDL: Bringing Ease of Use of Deep Learning for Apache Spark with Jason Dai ...
BigDL: Bringing Ease of Use of Deep Learning for Apache Spark with Jason Dai ...Databricks
 

Was ist angesagt? (16)

[OpenStack Day in Korea] Keynote#2 - Bringing OpenStack to the Enterprise Dat...
[OpenStack Day in Korea] Keynote#2 - Bringing OpenStack to the Enterprise Dat...[OpenStack Day in Korea] Keynote#2 - Bringing OpenStack to the Enterprise Dat...
[OpenStack Day in Korea] Keynote#2 - Bringing OpenStack to the Enterprise Dat...
 
5 ways to install @OpenShift in 5 minutes (Lightening Talk given at #DevConfC...
5 ways to install @OpenShift in 5 minutes (Lightening Talk given at #DevConfC...5 ways to install @OpenShift in 5 minutes (Lightening Talk given at #DevConfC...
5 ways to install @OpenShift in 5 minutes (Lightening Talk given at #DevConfC...
 
Apache Deep Learning 101 - ApacheCon Montreal 2018 v0.31
Apache Deep Learning 101 - ApacheCon Montreal 2018 v0.31Apache Deep Learning 101 - ApacheCon Montreal 2018 v0.31
Apache Deep Learning 101 - ApacheCon Montreal 2018 v0.31
 
[OpenStack Day in Korea 2015] Track 1 - Triple O를 이용한 빠르고 쉬운 OpenStack 설치
[OpenStack Day in Korea 2015] Track 1 - Triple O를 이용한 빠르고 쉬운 OpenStack 설치[OpenStack Day in Korea 2015] Track 1 - Triple O를 이용한 빠르고 쉬운 OpenStack 설치
[OpenStack Day in Korea 2015] Track 1 - Triple O를 이용한 빠르고 쉬운 OpenStack 설치
 
TripleO
 TripleO TripleO
TripleO
 
Red Hat presentatie: Open stack Latest Pure Tech
Red Hat presentatie: Open stack Latest Pure TechRed Hat presentatie: Open stack Latest Pure Tech
Red Hat presentatie: Open stack Latest Pure Tech
 
Open cloud infrastructure built for the enterprise
Open cloud infrastructure built for the enterpriseOpen cloud infrastructure built for the enterprise
Open cloud infrastructure built for the enterprise
 
Running Emerging AI Applications on Big Data Platforms with Ray On Apache Spark
Running Emerging AI Applications on Big Data Platforms with Ray On Apache SparkRunning Emerging AI Applications on Big Data Platforms with Ray On Apache Spark
Running Emerging AI Applications on Big Data Platforms with Ray On Apache Spark
 
Scale up and Scale Out Anaconda and PyData
Scale up and Scale Out Anaconda and PyDataScale up and Scale Out Anaconda and PyData
Scale up and Scale Out Anaconda and PyData
 
Extending TripleO for OpenStack Management
Extending TripleO for OpenStack ManagementExtending TripleO for OpenStack Management
Extending TripleO for OpenStack Management
 
Deep Learning to Big Data Analytics on Apache Spark Using BigDL with Xianyan ...
Deep Learning to Big Data Analytics on Apache Spark Using BigDL with Xianyan ...Deep Learning to Big Data Analytics on Apache Spark Using BigDL with Xianyan ...
Deep Learning to Big Data Analytics on Apache Spark Using BigDL with Xianyan ...
 
What HPC can learn from DevOps?
What HPC can learn from DevOps?What HPC can learn from DevOps?
What HPC can learn from DevOps?
 
Scaling PyData Up and Out
Scaling PyData Up and OutScaling PyData Up and Out
Scaling PyData Up and Out
 
HPNFVの取組みとMWC2015 – OpenStack最新情報セミナー 2015年4月
HPNFVの取組みとMWC2015 – OpenStack最新情報セミナー 2015年4月HPNFVの取組みとMWC2015 – OpenStack最新情報セミナー 2015年4月
HPNFVの取組みとMWC2015 – OpenStack最新情報セミナー 2015年4月
 
Bringing an AI Ecosystem to the Domain Expert and Enterprise AI Developer wit...
Bringing an AI Ecosystem to the Domain Expert and Enterprise AI Developer wit...Bringing an AI Ecosystem to the Domain Expert and Enterprise AI Developer wit...
Bringing an AI Ecosystem to the Domain Expert and Enterprise AI Developer wit...
 
BigDL: Bringing Ease of Use of Deep Learning for Apache Spark with Jason Dai ...
BigDL: Bringing Ease of Use of Deep Learning for Apache Spark with Jason Dai ...BigDL: Bringing Ease of Use of Deep Learning for Apache Spark with Jason Dai ...
BigDL: Bringing Ease of Use of Deep Learning for Apache Spark with Jason Dai ...
 

Ähnlich wie Leveraging docker for hadoop build automation and big data stack provisioning

How bigtop leveraged docker for build automation and one click hadoop provis...
How bigtop leveraged docker for build automation and  one click hadoop provis...How bigtop leveraged docker for build automation and  one click hadoop provis...
How bigtop leveraged docker for build automation and one click hadoop provis...Evans Ye
 
How bigtop leveraged docker for build automation and one click hadoop provis...
How bigtop leveraged docker for build automation and  one click hadoop provis...How bigtop leveraged docker for build automation and  one click hadoop provis...
How bigtop leveraged docker for build automation and one click hadoop provis...Evans Ye
 
State of Big Data on ARM64 / AArch64 - Apache Bigtop
State of Big Data on ARM64 / AArch64 - Apache BigtopState of Big Data on ARM64 / AArch64 - Apache Bigtop
State of Big Data on ARM64 / AArch64 - Apache BigtopGanesh Raju
 
Free GitOps Workshop
Free GitOps WorkshopFree GitOps Workshop
Free GitOps WorkshopWeaveworks
 
20150425 experimenting with openstack sahara on docker
20150425 experimenting with openstack sahara on docker20150425 experimenting with openstack sahara on docker
20150425 experimenting with openstack sahara on dockerWei Ting Chen
 
Road to NODES - Handling Neo4j Data with Apache Hop
Road to NODES - Handling Neo4j Data with Apache HopRoad to NODES - Handling Neo4j Data with Apache Hop
Road to NODES - Handling Neo4j Data with Apache HopNeo4j
 
Making sense of Apache Bigtop's role in ODPi and how it matters to Apache Apex
Making sense of Apache Bigtop's role in ODPi and how it matters to Apache ApexMaking sense of Apache Bigtop's role in ODPi and how it matters to Apache Apex
Making sense of Apache Bigtop's role in ODPi and how it matters to Apache ApexApache Apex
 
Galera on kubernetes_no_video
Galera on kubernetes_no_videoGalera on kubernetes_no_video
Galera on kubernetes_no_videoPatrick Galbraith
 
Apache Bigtop: a crash course in deploying a Hadoop bigdata management platform
Apache Bigtop: a crash course in deploying a Hadoop bigdata management platformApache Bigtop: a crash course in deploying a Hadoop bigdata management platform
Apache Bigtop: a crash course in deploying a Hadoop bigdata management platformrhatr
 
A Love Story with Kubevirt and Backstage from Cloud Native NoVA meetup Feb 2024
A Love Story with Kubevirt and Backstage from Cloud Native NoVA meetup Feb 2024A Love Story with Kubevirt and Backstage from Cloud Native NoVA meetup Feb 2024
A Love Story with Kubevirt and Backstage from Cloud Native NoVA meetup Feb 2024Cloud Native NoVA
 
Jfokus_Bringing the cloud back down to earth.pptx
Jfokus_Bringing the cloud back down to earth.pptxJfokus_Bringing the cloud back down to earth.pptx
Jfokus_Bringing the cloud back down to earth.pptxGrace Jansen
 
Fluo CICD OpenStack Summit
Fluo CICD OpenStack SummitFluo CICD OpenStack Summit
Fluo CICD OpenStack SummitMiguel Zuniga
 
PaaSTA: Running applications at Yelp
PaaSTA: Running applications at YelpPaaSTA: Running applications at Yelp
PaaSTA: Running applications at YelpNathan Handler
 
Detailed Introduction To Docker
Detailed Introduction To DockerDetailed Introduction To Docker
Detailed Introduction To Dockernklmish
 
FooConf23_Bringing the cloud back down to earth.pptx
FooConf23_Bringing the cloud back down to earth.pptxFooConf23_Bringing the cloud back down to earth.pptx
FooConf23_Bringing the cloud back down to earth.pptxGrace Jansen
 
Robust Network Security and Observability with GitOps and Cilium
Robust Network Security and Observability with GitOps and CiliumRobust Network Security and Observability with GitOps and Cilium
Robust Network Security and Observability with GitOps and CiliumWeaveworks
 
Commit to excellence - Java in containers
Commit to excellence - Java in containersCommit to excellence - Java in containers
Commit to excellence - Java in containersRed Hat Developers
 
Node.js what's next (Index 2018)
Node.js what's next (Index 2018)Node.js what's next (Index 2018)
Node.js what's next (Index 2018)Gibson Fahnestock
 
O'Reilly Software Architecture Conference London 2017: Building Resilient Mic...
O'Reilly Software Architecture Conference London 2017: Building Resilient Mic...O'Reilly Software Architecture Conference London 2017: Building Resilient Mic...
O'Reilly Software Architecture Conference London 2017: Building Resilient Mic...Ambassador Labs
 
DevOps: Arquitectura, Estrategia y Modelo
DevOps: Arquitectura, Estrategia y ModeloDevOps: Arquitectura, Estrategia y Modelo
DevOps: Arquitectura, Estrategia y ModeloSUSE España
 

Ähnlich wie Leveraging docker for hadoop build automation and big data stack provisioning (20)

How bigtop leveraged docker for build automation and one click hadoop provis...
How bigtop leveraged docker for build automation and  one click hadoop provis...How bigtop leveraged docker for build automation and  one click hadoop provis...
How bigtop leveraged docker for build automation and one click hadoop provis...
 
How bigtop leveraged docker for build automation and one click hadoop provis...
How bigtop leveraged docker for build automation and  one click hadoop provis...How bigtop leveraged docker for build automation and  one click hadoop provis...
How bigtop leveraged docker for build automation and one click hadoop provis...
 
State of Big Data on ARM64 / AArch64 - Apache Bigtop
State of Big Data on ARM64 / AArch64 - Apache BigtopState of Big Data on ARM64 / AArch64 - Apache Bigtop
State of Big Data on ARM64 / AArch64 - Apache Bigtop
 
Free GitOps Workshop
Free GitOps WorkshopFree GitOps Workshop
Free GitOps Workshop
 
20150425 experimenting with openstack sahara on docker
20150425 experimenting with openstack sahara on docker20150425 experimenting with openstack sahara on docker
20150425 experimenting with openstack sahara on docker
 
Road to NODES - Handling Neo4j Data with Apache Hop
Road to NODES - Handling Neo4j Data with Apache HopRoad to NODES - Handling Neo4j Data with Apache Hop
Road to NODES - Handling Neo4j Data with Apache Hop
 
Making sense of Apache Bigtop's role in ODPi and how it matters to Apache Apex
Making sense of Apache Bigtop's role in ODPi and how it matters to Apache ApexMaking sense of Apache Bigtop's role in ODPi and how it matters to Apache Apex
Making sense of Apache Bigtop's role in ODPi and how it matters to Apache Apex
 
Galera on kubernetes_no_video
Galera on kubernetes_no_videoGalera on kubernetes_no_video
Galera on kubernetes_no_video
 
Apache Bigtop: a crash course in deploying a Hadoop bigdata management platform
Apache Bigtop: a crash course in deploying a Hadoop bigdata management platformApache Bigtop: a crash course in deploying a Hadoop bigdata management platform
Apache Bigtop: a crash course in deploying a Hadoop bigdata management platform
 
A Love Story with Kubevirt and Backstage from Cloud Native NoVA meetup Feb 2024
A Love Story with Kubevirt and Backstage from Cloud Native NoVA meetup Feb 2024A Love Story with Kubevirt and Backstage from Cloud Native NoVA meetup Feb 2024
A Love Story with Kubevirt and Backstage from Cloud Native NoVA meetup Feb 2024
 
Jfokus_Bringing the cloud back down to earth.pptx
Jfokus_Bringing the cloud back down to earth.pptxJfokus_Bringing the cloud back down to earth.pptx
Jfokus_Bringing the cloud back down to earth.pptx
 
Fluo CICD OpenStack Summit
Fluo CICD OpenStack SummitFluo CICD OpenStack Summit
Fluo CICD OpenStack Summit
 
PaaSTA: Running applications at Yelp
PaaSTA: Running applications at YelpPaaSTA: Running applications at Yelp
PaaSTA: Running applications at Yelp
 
Detailed Introduction To Docker
Detailed Introduction To DockerDetailed Introduction To Docker
Detailed Introduction To Docker
 
FooConf23_Bringing the cloud back down to earth.pptx
FooConf23_Bringing the cloud back down to earth.pptxFooConf23_Bringing the cloud back down to earth.pptx
FooConf23_Bringing the cloud back down to earth.pptx
 
Robust Network Security and Observability with GitOps and Cilium
Robust Network Security and Observability with GitOps and CiliumRobust Network Security and Observability with GitOps and Cilium
Robust Network Security and Observability with GitOps and Cilium
 
Commit to excellence - Java in containers
Commit to excellence - Java in containersCommit to excellence - Java in containers
Commit to excellence - Java in containers
 
Node.js what's next (Index 2018)
Node.js what's next (Index 2018)Node.js what's next (Index 2018)
Node.js what's next (Index 2018)
 
O'Reilly Software Architecture Conference London 2017: Building Resilient Mic...
O'Reilly Software Architecture Conference London 2017: Building Resilient Mic...O'Reilly Software Architecture Conference London 2017: Building Resilient Mic...
O'Reilly Software Architecture Conference London 2017: Building Resilient Mic...
 
DevOps: Arquitectura, Estrategia y Modelo
DevOps: Arquitectura, Estrategia y ModeloDevOps: Arquitectura, Estrategia y Modelo
DevOps: Arquitectura, Estrategia y Modelo
 

Mehr von Evans Ye

Join ASF to Unlock Full Possibilities of Your Professional Career.pdf
Join ASF to Unlock Full Possibilities of Your Professional Career.pdfJoin ASF to Unlock Full Possibilities of Your Professional Career.pdf
Join ASF to Unlock Full Possibilities of Your Professional Career.pdfEvans Ye
 
非常人走非常路:參與ASF打世界杯比賽
非常人走非常路:參與ASF打世界杯比賽非常人走非常路:參與ASF打世界杯比賽
非常人走非常路:參與ASF打世界杯比賽Evans Ye
 
2017 big data landscape and cutting edge innovations public
2017 big data landscape and cutting edge innovations public2017 big data landscape and cutting edge innovations public
2017 big data landscape and cutting edge innovations publicEvans Ye
 
ONE FOR ALL! Using Apache Calcite to make SQL smart
ONE FOR ALL! Using Apache Calcite to make SQL smartONE FOR ALL! Using Apache Calcite to make SQL smart
ONE FOR ALL! Using Apache Calcite to make SQL smartEvans Ye
 
The Apache Way: A Proven Way Toward Success
The Apache Way: A Proven Way Toward SuccessThe Apache Way: A Proven Way Toward Success
The Apache Way: A Proven Way Toward SuccessEvans Ye
 
The Apache Way
The Apache WayThe Apache Way
The Apache WayEvans Ye
 
Using the SDACK Architecture to Build a Big Data Product
Using the SDACK Architecture to Build a Big Data ProductUsing the SDACK Architecture to Build a Big Data Product
Using the SDACK Architecture to Build a Big Data ProductEvans Ye
 
Trend Micro Big Data Platform and Apache Bigtop
Trend Micro Big Data Platform and Apache BigtopTrend Micro Big Data Platform and Apache Bigtop
Trend Micro Big Data Platform and Apache BigtopEvans Ye
 
BigTop vm and docker provisioner
BigTop vm and docker provisionerBigTop vm and docker provisioner
BigTop vm and docker provisionerEvans Ye
 
Docker workshop
Docker workshopDocker workshop
Docker workshopEvans Ye
 
Fits docker into devops
Fits docker into devopsFits docker into devops
Fits docker into devopsEvans Ye
 
Deep dive into enterprise data lake through Impala
Deep dive into enterprise data lake through ImpalaDeep dive into enterprise data lake through Impala
Deep dive into enterprise data lake through ImpalaEvans Ye
 
How we lose etu hadoop competition
How we lose etu hadoop competitionHow we lose etu hadoop competition
How we lose etu hadoop competitionEvans Ye
 
Network Traffic Search using Apache HBase
Network Traffic Search using Apache HBaseNetwork Traffic Search using Apache HBase
Network Traffic Search using Apache HBaseEvans Ye
 
Hdfs ha using journal nodes
Hdfs ha using journal nodesHdfs ha using journal nodes
Hdfs ha using journal nodesEvans Ye
 
How to be a star engineer
How to be a star engineerHow to be a star engineer
How to be a star engineerEvans Ye
 

Mehr von Evans Ye (17)

Join ASF to Unlock Full Possibilities of Your Professional Career.pdf
Join ASF to Unlock Full Possibilities of Your Professional Career.pdfJoin ASF to Unlock Full Possibilities of Your Professional Career.pdf
Join ASF to Unlock Full Possibilities of Your Professional Career.pdf
 
非常人走非常路:參與ASF打世界杯比賽
非常人走非常路:參與ASF打世界杯比賽非常人走非常路:參與ASF打世界杯比賽
非常人走非常路:參與ASF打世界杯比賽
 
2017 big data landscape and cutting edge innovations public
2017 big data landscape and cutting edge innovations public2017 big data landscape and cutting edge innovations public
2017 big data landscape and cutting edge innovations public
 
ONE FOR ALL! Using Apache Calcite to make SQL smart
ONE FOR ALL! Using Apache Calcite to make SQL smartONE FOR ALL! Using Apache Calcite to make SQL smart
ONE FOR ALL! Using Apache Calcite to make SQL smart
 
The Apache Way: A Proven Way Toward Success
The Apache Way: A Proven Way Toward SuccessThe Apache Way: A Proven Way Toward Success
The Apache Way: A Proven Way Toward Success
 
The Apache Way
The Apache WayThe Apache Way
The Apache Way
 
Using the SDACK Architecture to Build a Big Data Product
Using the SDACK Architecture to Build a Big Data ProductUsing the SDACK Architecture to Build a Big Data Product
Using the SDACK Architecture to Build a Big Data Product
 
Trend Micro Big Data Platform and Apache Bigtop
Trend Micro Big Data Platform and Apache BigtopTrend Micro Big Data Platform and Apache Bigtop
Trend Micro Big Data Platform and Apache Bigtop
 
BigTop vm and docker provisioner
BigTop vm and docker provisionerBigTop vm and docker provisioner
BigTop vm and docker provisioner
 
Docker workshop
Docker workshopDocker workshop
Docker workshop
 
Fits docker into devops
Fits docker into devopsFits docker into devops
Fits docker into devops
 
Deep dive into enterprise data lake through Impala
Deep dive into enterprise data lake through ImpalaDeep dive into enterprise data lake through Impala
Deep dive into enterprise data lake through Impala
 
How we lose etu hadoop competition
How we lose etu hadoop competitionHow we lose etu hadoop competition
How we lose etu hadoop competition
 
Network Traffic Search using Apache HBase
Network Traffic Search using Apache HBaseNetwork Traffic Search using Apache HBase
Network Traffic Search using Apache HBase
 
Vagrant
VagrantVagrant
Vagrant
 
Hdfs ha using journal nodes
Hdfs ha using journal nodesHdfs ha using journal nodes
Hdfs ha using journal nodes
 
How to be a star engineer
How to be a star engineerHow to be a star engineer
How to be a star engineer
 

Kürzlich hochgeladen

2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024The Digital Insurer
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Principled Technologies
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 

Kürzlich hochgeladen (20)

2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 

Leveraging docker for hadoop build automation and big data stack provisioning

  • 1. LEVERAGING DOCKER FOR HADOOP BUILD AUTOMATION AND 
 BIG DATA STACK PROVISIONING Evans Ye, Sr. Software Engineer DataWorks Summit San Jose 2017
  • 2. Who am I • Tech Lead @ APAC Data Team, Y! Taiwan • Building data products for E-Commerce business • PMC chair of Apache Bigtop, ASF member 2
  • 3. Outline • Quick Intro to Apache Bigtop • Docker for Bigtop Packaging • Docker for Bigtop Provisioner • Docker for Bigtop Sandbox • Releases 3
  • 4. QUICK INTRO TO 
 APACHE BIGTOP 4
  • 7. But there're some other great Hadoop ecosystem components.. 7
  • 8. How do I add patches? 8
  • 9. 9
  • 10. From source code to packages Bigtop
 Packaging 10
  • 12. Bigtop feature set Packaging Testing Deployment Virtualization for you to easily build your own Big Data Stack 12
  • 13. Community stats • 94 total contributors • Spark: 1093, Hadoop: 99, HBase: 126, Hive:115 • 5 years since 2012 • 30 Hadoop ecosystem components packaged • 5 Linux Distro., 2 archs supported 13
  • 14. DOCKER FOR 
 BIGTOP PACKAGING 14
  • 17. Bigtop Toolchain • Puppet recipes to install required libraries, build tools • To prepare a build environment: • Prerequisite : ▪ Java git clone https://github.com/apache/bigtop.git cd bigtop ./bigtop_toolchain/bin/puppetize.sh ./gradlew toolchain 17
  • 18. CI infrastructure CentOS slave Fedora slave Ubuntu slave Debian slave OpenSuSE slave 18
  • 19. CI infrastructure CentOS slave Fedora slave Ubuntu slave Debian slave OpenSuSE slave Bigtop Toolchain Bigtop Toolchain Bigtop Toolchain Bigtop Toolchain Bigtop Toolchain 19
  • 20. CI infrastructure CentOS slave Fedora slave Ubuntu slave Debian slave OpenSuSE slave Bigtop Toolchain Bigtop Toolchain Bigtop Toolchain Bigtop Toolchain Bigtop Toolchain 20
  • 21. Dockerlized CI infrastructure CentOS slave Fedora slave Ubuntu slave Debian slave OpenSuSE slave • Immutable env • Fault tolerance 21
  • 22. Dockerlized CI infrastructure CentOS slave Fedora slave Ubuntu slave Debian slave OpenSuSE slave • Immutable env • Fault tolerance 22
  • 23. • Execute shell • Bigtop CI Setup Guide How to build packages # OS=debian-8 # COMPONENT=hadoop docker run -u jenkins --rm -v `pwd`:/bigtop --workdir /bigtop bigtop/slaves:trunk-$OS bash -l -c "./gradlew allclean $COMPONENT-pkg" 23
  • 24. Bigtop packages on master https://ci.bigtop.apache.org/view/Packages/job/Bigtop-trunk-packages/ 24
  • 25. • Example: How to port Bigtop Distribution to PPC64LE? • Prepare PPC64LE docker base image • Apply Bigtop Toolchain on PPC64LE docker image • Build Bigtop packages on PPC64LE slaves image • 2016: Ported 22 out of 24 Bigtop components in 2 weeks, with only 5 patches • Credit: Amir Sanjar, IBM Extremely friendly for porting 25
  • 26. Bigtop early mission accomplished Leveraged by app providers… 26
  • 27. Get out from the Apache dome 27
  • 28. New focus and target user • Data engineers vs Distro. builders • Solution diversity: ▪ Streaming: Flink, Apex ▪ In-memory cache: Alluxio, Ignite ▪ User/developer tools: ▪ Bigtop Provisioner ▪ Bigtop Sandbox • Big data stack references • Machine learning, deep learning components 28
  • 29. DOCKER FOR 
 BIGTOP PROVISIONER 29
  • 30. Bigtop Provisioner • A tool to demonstrate full life cycle of Bigtop Packaging TestingDeploymentVirtualization Create resources Run Bigtop Puppet Run Bigtop Tests Bigtop Provisioner 30
  • 31. • We use Vagrant as an abstraction layer to support different kind of resource providers Vagrant Providers
  • 32. One click Hadoop provisioning
 (Bigtop 1.0.0) bigtop/deploy image 
 on Docker hub ./docker-hadoop.sh -c 3 puppet apply puppet apply puppet apply 32 https://cwiki.apache.org/confluence/display/BIGTOP/Bigtop+Provisioner+User+Guide
  • 33. Problems with Vagrant’s Docker Provider • Need to add vagrant public key into docker images • Too many issues with auto-created boot2docker VM • A bug for docker provider regarding provision keeps opening for 2 years ▪ Waiting for machine to boot' hangs infinitely • Can not share same code for different providers anyway • Not all the docker options supported in Vagrantfile • ^#?& slow 33
  • 34. Replaced by docker-compose 
 (Bigtop 1.2.0) ./docker-hadoop.sh -c 3 puppet apply puppet apply puppet apply 34 bigtop/deploy image 
 on Docker hub
  • 35. Advantages • No need to create customized image beforehand • Better compatibility with Docker’s native solutions • Clear, simple yaml file for orchestration settings • Supports new features such as overlay network • Leverage Swarm for multi-node cluster deployment • Fast —> better user experience 35
  • 36. • Execute shell • Bigtop CI Setup Guide How to run Docker Provisioner # See bigtop/provisioner/docker/*.yaml CONFIG=YOUR_CUSTOM_CONF.yaml # provision ./gradlew -Pconfig=${CONFIG} -Pnum_instances=1 docker-provisioner # destroy provisioned cluster ./gradlew docker-provisioner-destroy 36
  • 37. YOUR_CUSTOM_CONF.yaml example 37 docker: memory_limit: "4g" image: "bigtop/puppet:centos-7" repo: "http://bigtop-repos.s3.amazonaws.com/releases/1.2.0/ centos/7/x86_64" distro: centos components: [hdfs, yarn, mapreduce] enable_local_repo: false smoke_test_components: [hdfs, yarn, mapreduce]
  • 39. Use cases • For application developers, cluster admins, users ▪ Run a Hadoop cluster to test your code on ▪ Try & test configurations before applying to Production ▪ Play around with Bigtop Big Data Stacks • For contributors ▪ Easy to test your packaging, deployment, testing code • For Distro. builders ▪ CI matrix —> patch upstream code made easier 39
  • 41. Introducing Bigtop Sandbox • Easy way to get started • Docker images that has Bigtop stacks installed and configured • Pseudo cluster up & running w/o installation • Command-line tool for you to build your own stack 41
  • 43. Docker image layer Concrete implementation HDFS + YARN + Spark Bigtop Puppet bigtop/puppet:ubuntu-16.04 43
  • 45. site.yaml example 45 bigtop::hadoop_head_node: bigtop.example.com bigtop::bigtop_repo_uri: http://bigtop-repos.s3.amazonaws.com/ releases/1.2.0/debian/8/x86_64 hadoop::hadoop_storage_dirs: [/data/1, /data/2] hadoop_cluster_node::cluster_components: [hdfs, yarn, spark]
  • 46. How to build • Or specify your custom conf: git clone https://github.com/apache/bigtop.git cd bigtop/docker/sandbox ./build.sh -a bigtop -o ubuntu-16.04 -c "hdfs, yarn, spark" ./build.sh-a bigtop -o ubuntu-16.04 -f custom_site.yaml -t dws2017 46
  • 48. How to run docker run --name sandbox -d -p 50070:50070 -p 8088:8088 evansye/sandbox:dws2017 docker logs -f sandbox docker exec sandbox spark-example SparkPi 48
  • 49. 49
  • 50. Bigtop Provisioner Bigtop Sandbox Scalable V X Portable X V Flexibility High Medium Speed > 2 mins > 15 secs Requires Network V X Port forwarding X V 50
  • 51. Bigtop Provisioner Bigtop Sandbox Data engineers Multi-node 
 cluster testing Build/use sandboxes 
 for dev & test Ops Multi-node 
 cluster testing Single node 
 testing Contributors Test packages, puppet recipes,
 test cases Test packages, puppet recipes,
 test cases Distro. Builders Test packages, puppet recipes,
 test cases Provide Sandboxes 51
  • 52. Integration test in CI/CD pipeline Unit Test Source code Compile Build Image Integra7on test with Sandbox Sandbox Service CD pipeline with Bigtop Sandbox Docker Registry Push Image Deploy FINISHED Data 52
  • 53. Future • Production deployment using Sandbox images ▪ --net host or overlay network(SDN)? ▪ External volumes for edit logs, fsimages, etc ▪ Cluster orchestration ▪ Swarm, Kubernetes? 53
  • 55. ▪ New components: ▪ Ambari 2.5.0 ▪ GPDB 5.0.0-alpha.0
 (Greenplum) Bigtop 1.2.0 Released April, 2017 ▪ Featured upgrade: ▪ Hadoop 2.7.3 ▪ Spark 2.1.0 ▪ Kafka 0.10.1.1 ▪ HBase 1.1.3 ▪ and more 55
  • 56. • New features: ▪ Juju bigtop charms ▪ Bigtop Sandbox (alpha, recommended to try master) • Improvement: ▪ Bigtop Docker Provisioner made faster New features in Bigtop 1.2.0 56
  • 57. Juju Cloud Weather Report http://bigtop.charm.qa/ 57
  • 58. • Expected to be out late June • Hadoop 2.7.4 
 (Interested in docker container support back ported, but I'm not sure yet) • Mainly bug fixes: • Packages • Deployments • Sandbox Bigtop 1.2.1 up coming 58
  • 59. • Machine Learning and Deep Learning integration • Support aarch 64 • Enhance support set in Bigtop Puppet (not all components covered) • Extend the CI matrix coverage to Bigtop Tests • Ambari Bigtop stack integration • Provide Big data stack references Road ahead towards 1.3.0 59
  • 60. 60
  • 61. • Submit your proposal, contribute Bigtop w/ funding! • Improvements, new features, build, test, CI, etc • CFP opened June 13, 2017
 CFP closed July 14, 2017 • https://www.odpi.org/community/bigtopgrantfund ODPi Apache Bigtop Test Drive Program 61
  • 62. • Join mailing list, ask questions, suggest features, etc • Contribute (components, tutorials, docs) • Report bugs ▪ Home page: http://bigtop.apache.org/ ▪ mailing list: http://bigtop.apache.org/mail-lists.html ▪ Document: https://cwiki.apache.org/confluence/display/BIGTOP/Index ▪ Source code: https://github.com/apache/bigtop ▪ Packages: https://www.apache.org/dist/bigtop/bigtop-1.2.0/repos/ ▪ JIRA: https://issues.apache.org/jira/browse/BIGTOP Reference 62