SlideShare ist ein Scribd-Unternehmen logo
1 von 16
Downloaden Sie, um offline zu lesen
HKG18-220: State of Big Data on
AArch64 - Apache BigTop
Ganesh Raju, Jun He & Naresh Bhat
Big Data Team, LEG
Agenda
● Why and What is Bigtop
● Bigtop patches for AArch64
● Challenges of porting
● Walk through of setup and installation process
● Demo on Provisioning and Smoke Tests
Why Apache Bigtop?
● Hadoop is a collection of many components
○ Numerous versions (Dependency hell)
○ Lots of patches
○ No stable development environment with certified
binaries
○ No proper integrated tests
● With Bigtop - Build, Deploy in cluster with puppet,
Configure, Install and Test
● Juju orchestration
● Blueprints
● Seamless integration into CI
What is in Apache Bigtop?
● Output
○ A set of binaries(deb and rpm) just like HDP, ODPi, CDH, etc
○ Docker images
○ Docker sandbox images
● Integration code, Packaging code, Deployment code, Orchestration code
● Validation code
○ Integration tests
■ Clean slate provisioning
■ Dependency integration artifacts
■ Versioned test artifacts
■ Plug and play artifacts
■ JVM-based artifacts
○ Packaging tests
○ Smoke tests
● Continuous Integration
Consumers of Bigtop
Some of consumers of Bigtop
● ODPi
● Hortonworks
● Amazon
● Canonical
● EMC
● Pivotal
● Infosys
● Capgemini
● Ebay
● Intel
● TrendMicro
● WANdisco
Bigtop v1.2
A few highlights of the v1.2 series release include:
● 6 Distros, 2 archs (x86, and ppc64le) supported
○ ARM support with v1.3
● A newly introduced Bigtop Sandbox feature
● A faster Docker Provisioner which is rewritten to fully embrace Docker
ecosystem
● OpenJDK 8 support
● Hadoop 2.7.3, Spark 2.1.1, HBase 1.1.9, and Zeppelin 0.72 are used
● And many upgrades of the ecosystem projects
(Apex, Crunch, Flume, Ignite, Mahout, Oozie, Phoenix, and many others)
Components of Bigtop
alluxio v1.0.1 Greenplum gpdb v5.0.0-alpha.0 Apache pig v0.15.0
Apache ambari v2.5.0 Apache hadoop v2.7.3 Quantcast qfs v1.1.4
Apache apex v3.5.0 Apache hama v0.7.0 Apache solr v4.10.4
groovy v2.4.10 Apache hbase v1.1.3 Apache spark 1.1 v1.6.2
Apache commons - jsvc v1.0.15 Apache hive v1.2.1 Apache spark 2.0 v2.1.1
Apache tomcat v6.0.45 Apache hue v3.11.0 Apache sqoop v1 v1.4.6
bigtop_utils v1.2.0 Apache ignite v1.9.0 Apache sqoop v2 v1.99.4
Apache crunch v0.14.0 Apache kafka v0.10.1.1 Apache tajo v0.11.1
Pig UDF datafu v1.3.0 kite v1.1.0 Apache tez v0.6.2
Apache flink v1.1.3 Apache mahout v0.12.2 ycsb v0.4.0
Apache flume v1.7.0 Apache oozie v4.3.0 Apache zeppelin v0.7.0
Apache giraph v1.1.0 Apache phoenix v4.9.0-HBase-1.1 Apache zookeeper v3.4.6
Contributions from ARM Ecosystem
● AArch64 CI nodes are running on Linaro DevCloud
○ 3 distros are supported: Debian-9, Fedora-26, Ubuntu-16.04
https://ci.bigtop.apache.org/job/Bigtop-trunk-packages/
Contributions from ARM Ecosystem
● Patches to enable components on AArch64
○ Build
■ Hadoop, Solr, Hbase, Ignite, …
○ Package
■ Hama, Solr, Oozie, Hue, …
○ Deploy
■ Service scripts, Automation scripts, Dockerfiles,…
○ Test
■ SmokeTests and Provisioner settings
Contributions from ARM Ecosystem
● Lessons learned
○ Dependency issues
■ Native binaries: protobuf, phantomjs, …
■ Jars: levedb-jni, ignite-shmem, jffi, …
■ Version mismatch: slf4j, log4j, log4j2, …
○ Repos
■ Official release did not support aarch64
● Had to create private/local repo
Challenges
● Cyclic references take a lot of effort to fix
● Though most big data companies all use Bigtop, there has not been
contributions coming in from them
● With founders moving out end of last year, and lead committers changing
focus, the project has lost momentum
Roadmap
● Make Bigtop a 1st class citizen on Kubernetes
● Test out Puppet deployment code in variety of different scenarios including
developers spinning up test clusters via Docker deployer
● Improve Docker deployer to be more developer friendly and hook it back into
Gradle
● Provide predefined sample stacks for specific use cases. For example:
○ Machine Learning: Hadoop+Spark+Zeppelin
○ Streaming: Hadoop+Kafka+Flink
● Create more tests
● Enable Ambari to install Bigtop stack. Utilize the work done for ODPi
Sandbox
● What is Sandbox?
A tool to build and run big data pseudo cluster using Docker
● Command to generate sandbox image
$ ./build.sh -a bigtop -o centos-7 -c "hdfs, yarn, spark, ignite"
● You can do a dry run using option “--dryrun” command
● How to run the sandbox image
$ docker run -d -p 50070:50070 bigtop/sandbox:centos-7_hdfs
Smoke Tests
● Uses yaml file to configure located under
<BIGTOP_SRC_TOP>/provisioner/docker/
● Components to configure
○ docker image
○ distro type
○ components to install
○ components to test
○ JDK
● Environment check
$ ./docker-hadoop.sh -E
● Execution
$ cd provisioner/docker
$ ./docker-hadoop.sh -C <smoke_test_cfg_yaml> -c <node_count> -s -d
Glossary
● Linaro Collaborate page
● Bigtop wiki page
● Smoke Test Collaborate page
● Smoke Test Results
Thank You
#HKG18
HKG18 keynotes and videos on: connect.linaro.org
For further information: www.linaro.org

Weitere ähnliche Inhalte

Was ist angesagt?

Introduction to Kafka and Zookeeper
Introduction to Kafka and ZookeeperIntroduction to Kafka and Zookeeper
Introduction to Kafka and ZookeeperRahul Jain
 
MySQL Group Replication - HandsOn Tutorial
MySQL Group Replication - HandsOn TutorialMySQL Group Replication - HandsOn Tutorial
MySQL Group Replication - HandsOn TutorialKenny Gryp
 
Pegasus In Depth (2018/10)
Pegasus In Depth (2018/10)Pegasus In Depth (2018/10)
Pegasus In Depth (2018/10)涛 吴
 
Introduction to Cassandra
Introduction to CassandraIntroduction to Cassandra
Introduction to CassandraGokhan Atil
 
Modern Algorithms and Data Structures - 1. Bloom Filters, Merkle Trees
Modern Algorithms and Data Structures - 1. Bloom Filters, Merkle TreesModern Algorithms and Data Structures - 1. Bloom Filters, Merkle Trees
Modern Algorithms and Data Structures - 1. Bloom Filters, Merkle TreesLorenzo Alberton
 
Webinar | How to Understand Apache Cassandra™ Performance Through Read/Writ...
Webinar  |  How to Understand Apache Cassandra™ Performance Through Read/Writ...Webinar  |  How to Understand Apache Cassandra™ Performance Through Read/Writ...
Webinar | How to Understand Apache Cassandra™ Performance Through Read/Writ...DataStax
 
RocksDB detail
RocksDB detailRocksDB detail
RocksDB detailMIJIN AN
 
Label based Mandatory Access Control on PostgreSQL
Label based Mandatory Access Control on PostgreSQLLabel based Mandatory Access Control on PostgreSQL
Label based Mandatory Access Control on PostgreSQLKohei KaiGai
 
HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseHBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseenissoz
 
Ozone and HDFS's Evolution
Ozone and HDFS's EvolutionOzone and HDFS's Evolution
Ozone and HDFS's EvolutionDataWorks Summit
 
Maria DB Galera Cluster for High Availability
Maria DB Galera Cluster for High AvailabilityMaria DB Galera Cluster for High Availability
Maria DB Galera Cluster for High AvailabilityOSSCube
 
Introduction to cassandra
Introduction to cassandraIntroduction to cassandra
Introduction to cassandraNguyen Quang
 
Apache Hive Tutorial
Apache Hive TutorialApache Hive Tutorial
Apache Hive TutorialSandeep Patil
 
Spark Operator—Deploy, Manage and Monitor Spark clusters on Kubernetes
 Spark Operator—Deploy, Manage and Monitor Spark clusters on Kubernetes Spark Operator—Deploy, Manage and Monitor Spark clusters on Kubernetes
Spark Operator—Deploy, Manage and Monitor Spark clusters on KubernetesDatabricks
 
MariaDB MaxScale monitor 매뉴얼
MariaDB MaxScale monitor 매뉴얼MariaDB MaxScale monitor 매뉴얼
MariaDB MaxScale monitor 매뉴얼NeoClova
 
Apache Cassandra at the Geek2Geek Berlin
Apache Cassandra at the Geek2Geek BerlinApache Cassandra at the Geek2Geek Berlin
Apache Cassandra at the Geek2Geek BerlinChristian Johannsen
 
Simplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache KuduSimplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache KuduCloudera, Inc.
 

Was ist angesagt? (20)

Introduction to Kafka and Zookeeper
Introduction to Kafka and ZookeeperIntroduction to Kafka and Zookeeper
Introduction to Kafka and Zookeeper
 
MySQL Group Replication - HandsOn Tutorial
MySQL Group Replication - HandsOn TutorialMySQL Group Replication - HandsOn Tutorial
MySQL Group Replication - HandsOn Tutorial
 
Pegasus In Depth (2018/10)
Pegasus In Depth (2018/10)Pegasus In Depth (2018/10)
Pegasus In Depth (2018/10)
 
Introduction to Cassandra
Introduction to CassandraIntroduction to Cassandra
Introduction to Cassandra
 
Apache Spark Architecture
Apache Spark ArchitectureApache Spark Architecture
Apache Spark Architecture
 
Modern Algorithms and Data Structures - 1. Bloom Filters, Merkle Trees
Modern Algorithms and Data Structures - 1. Bloom Filters, Merkle TreesModern Algorithms and Data Structures - 1. Bloom Filters, Merkle Trees
Modern Algorithms and Data Structures - 1. Bloom Filters, Merkle Trees
 
Webinar | How to Understand Apache Cassandra™ Performance Through Read/Writ...
Webinar  |  How to Understand Apache Cassandra™ Performance Through Read/Writ...Webinar  |  How to Understand Apache Cassandra™ Performance Through Read/Writ...
Webinar | How to Understand Apache Cassandra™ Performance Through Read/Writ...
 
NOSQL vs SQL
NOSQL vs SQLNOSQL vs SQL
NOSQL vs SQL
 
Log Structured Merge Tree
Log Structured Merge TreeLog Structured Merge Tree
Log Structured Merge Tree
 
RocksDB detail
RocksDB detailRocksDB detail
RocksDB detail
 
Label based Mandatory Access Control on PostgreSQL
Label based Mandatory Access Control on PostgreSQLLabel based Mandatory Access Control on PostgreSQL
Label based Mandatory Access Control on PostgreSQL
 
HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseHBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBase
 
Ozone and HDFS's Evolution
Ozone and HDFS's EvolutionOzone and HDFS's Evolution
Ozone and HDFS's Evolution
 
Maria DB Galera Cluster for High Availability
Maria DB Galera Cluster for High AvailabilityMaria DB Galera Cluster for High Availability
Maria DB Galera Cluster for High Availability
 
Introduction to cassandra
Introduction to cassandraIntroduction to cassandra
Introduction to cassandra
 
Apache Hive Tutorial
Apache Hive TutorialApache Hive Tutorial
Apache Hive Tutorial
 
Spark Operator—Deploy, Manage and Monitor Spark clusters on Kubernetes
 Spark Operator—Deploy, Manage and Monitor Spark clusters on Kubernetes Spark Operator—Deploy, Manage and Monitor Spark clusters on Kubernetes
Spark Operator—Deploy, Manage and Monitor Spark clusters on Kubernetes
 
MariaDB MaxScale monitor 매뉴얼
MariaDB MaxScale monitor 매뉴얼MariaDB MaxScale monitor 매뉴얼
MariaDB MaxScale monitor 매뉴얼
 
Apache Cassandra at the Geek2Geek Berlin
Apache Cassandra at the Geek2Geek BerlinApache Cassandra at the Geek2Geek Berlin
Apache Cassandra at the Geek2Geek Berlin
 
Simplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache KuduSimplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache Kudu
 

Ähnlich wie State of Big Data on ARM64 / AArch64 - Apache Bigtop

Leveraging docker for hadoop build automation and big data stack provisioning
Leveraging docker for hadoop build automation and big data stack provisioningLeveraging docker for hadoop build automation and big data stack provisioning
Leveraging docker for hadoop build automation and big data stack provisioningEvans Ye
 
Leveraging Docker for Hadoop build automation and Big Data stack provisioning
Leveraging Docker for Hadoop build automation and Big Data stack provisioningLeveraging Docker for Hadoop build automation and Big Data stack provisioning
Leveraging Docker for Hadoop build automation and Big Data stack provisioningDataWorks Summit
 
How bigtop leveraged docker for build automation and one click hadoop provis...
How bigtop leveraged docker for build automation and  one click hadoop provis...How bigtop leveraged docker for build automation and  one click hadoop provis...
How bigtop leveraged docker for build automation and one click hadoop provis...Evans Ye
 
How bigtop leveraged docker for build automation and one click hadoop provis...
How bigtop leveraged docker for build automation and  one click hadoop provis...How bigtop leveraged docker for build automation and  one click hadoop provis...
How bigtop leveraged docker for build automation and one click hadoop provis...Evans Ye
 
Making sense of Apache Bigtop's role in ODPi and how it matters to Apache Apex
Making sense of Apache Bigtop's role in ODPi and how it matters to Apache ApexMaking sense of Apache Bigtop's role in ODPi and how it matters to Apache Apex
Making sense of Apache Bigtop's role in ODPi and how it matters to Apache ApexApache Apex
 
Delivering a bleeding edge community-led openstack distribution: RDO
Delivering a bleeding edge community-led openstack distribution: RDO Delivering a bleeding edge community-led openstack distribution: RDO
Delivering a bleeding edge community-led openstack distribution: RDO Chandan Kumar
 
Docker module 1
Docker module 1Docker module 1
Docker module 1Liang Bo
 
Road to NODES - Handling Neo4j Data with Apache Hop
Road to NODES - Handling Neo4j Data with Apache HopRoad to NODES - Handling Neo4j Data with Apache Hop
Road to NODES - Handling Neo4j Data with Apache HopNeo4j
 
Write microservice in golang
Write microservice in golangWrite microservice in golang
Write microservice in golangBo-Yi Wu
 
Apigee deploy grunt plugin.1.0
Apigee deploy grunt plugin.1.0Apigee deploy grunt plugin.1.0
Apigee deploy grunt plugin.1.0Diego Zuluaga
 
Red Hat Forum Benelux 2015
Red Hat Forum Benelux 2015Red Hat Forum Benelux 2015
Red Hat Forum Benelux 2015Microsoft
 
Cloud native IPC for Microservices Workshop @ Containerdays 2022
Cloud native IPC for Microservices Workshop @ Containerdays 2022Cloud native IPC for Microservices Workshop @ Containerdays 2022
Cloud native IPC for Microservices Workshop @ Containerdays 2022QAware GmbH
 
Multi-stage Docker builds to make building easy!
Multi-stage Docker builds to make building easy!Multi-stage Docker builds to make building easy!
Multi-stage Docker builds to make building easy!Milindu Sanoj Kumarage
 
Deploying software at Scale
Deploying software at ScaleDeploying software at Scale
Deploying software at ScaleKris Buytaert
 
Galera on kubernetes_no_video
Galera on kubernetes_no_videoGalera on kubernetes_no_video
Galera on kubernetes_no_videoPatrick Galbraith
 
Gocd – Kubernetes/Nomad Continuous Deployment
Gocd – Kubernetes/Nomad Continuous DeploymentGocd – Kubernetes/Nomad Continuous Deployment
Gocd – Kubernetes/Nomad Continuous DeploymentLeandro Totino Pereira
 
Docker to the Rescue of an Ops Team
Docker to the Rescue of an Ops TeamDocker to the Rescue of an Ops Team
Docker to the Rescue of an Ops TeamRachid Zarouali
 
Docker to the Rescue of an Ops Team
Docker to the Rescue of an Ops TeamDocker to the Rescue of an Ops Team
Docker to the Rescue of an Ops TeamDocker, Inc.
 
BKK16-400B ODPI - Standardizing Hadoop
BKK16-400B ODPI - Standardizing HadoopBKK16-400B ODPI - Standardizing Hadoop
BKK16-400B ODPI - Standardizing HadoopLinaro
 
ODPi (Open Data Platform Initiative) - Linaro Connect
ODPi (Open Data Platform Initiative) - Linaro ConnectODPi (Open Data Platform Initiative) - Linaro Connect
ODPi (Open Data Platform Initiative) - Linaro ConnectGanesh Raju
 

Ähnlich wie State of Big Data on ARM64 / AArch64 - Apache Bigtop (20)

Leveraging docker for hadoop build automation and big data stack provisioning
Leveraging docker for hadoop build automation and big data stack provisioningLeveraging docker for hadoop build automation and big data stack provisioning
Leveraging docker for hadoop build automation and big data stack provisioning
 
Leveraging Docker for Hadoop build automation and Big Data stack provisioning
Leveraging Docker for Hadoop build automation and Big Data stack provisioningLeveraging Docker for Hadoop build automation and Big Data stack provisioning
Leveraging Docker for Hadoop build automation and Big Data stack provisioning
 
How bigtop leveraged docker for build automation and one click hadoop provis...
How bigtop leveraged docker for build automation and  one click hadoop provis...How bigtop leveraged docker for build automation and  one click hadoop provis...
How bigtop leveraged docker for build automation and one click hadoop provis...
 
How bigtop leveraged docker for build automation and one click hadoop provis...
How bigtop leveraged docker for build automation and  one click hadoop provis...How bigtop leveraged docker for build automation and  one click hadoop provis...
How bigtop leveraged docker for build automation and one click hadoop provis...
 
Making sense of Apache Bigtop's role in ODPi and how it matters to Apache Apex
Making sense of Apache Bigtop's role in ODPi and how it matters to Apache ApexMaking sense of Apache Bigtop's role in ODPi and how it matters to Apache Apex
Making sense of Apache Bigtop's role in ODPi and how it matters to Apache Apex
 
Delivering a bleeding edge community-led openstack distribution: RDO
Delivering a bleeding edge community-led openstack distribution: RDO Delivering a bleeding edge community-led openstack distribution: RDO
Delivering a bleeding edge community-led openstack distribution: RDO
 
Docker module 1
Docker module 1Docker module 1
Docker module 1
 
Road to NODES - Handling Neo4j Data with Apache Hop
Road to NODES - Handling Neo4j Data with Apache HopRoad to NODES - Handling Neo4j Data with Apache Hop
Road to NODES - Handling Neo4j Data with Apache Hop
 
Write microservice in golang
Write microservice in golangWrite microservice in golang
Write microservice in golang
 
Apigee deploy grunt plugin.1.0
Apigee deploy grunt plugin.1.0Apigee deploy grunt plugin.1.0
Apigee deploy grunt plugin.1.0
 
Red Hat Forum Benelux 2015
Red Hat Forum Benelux 2015Red Hat Forum Benelux 2015
Red Hat Forum Benelux 2015
 
Cloud native IPC for Microservices Workshop @ Containerdays 2022
Cloud native IPC for Microservices Workshop @ Containerdays 2022Cloud native IPC for Microservices Workshop @ Containerdays 2022
Cloud native IPC for Microservices Workshop @ Containerdays 2022
 
Multi-stage Docker builds to make building easy!
Multi-stage Docker builds to make building easy!Multi-stage Docker builds to make building easy!
Multi-stage Docker builds to make building easy!
 
Deploying software at Scale
Deploying software at ScaleDeploying software at Scale
Deploying software at Scale
 
Galera on kubernetes_no_video
Galera on kubernetes_no_videoGalera on kubernetes_no_video
Galera on kubernetes_no_video
 
Gocd – Kubernetes/Nomad Continuous Deployment
Gocd – Kubernetes/Nomad Continuous DeploymentGocd – Kubernetes/Nomad Continuous Deployment
Gocd – Kubernetes/Nomad Continuous Deployment
 
Docker to the Rescue of an Ops Team
Docker to the Rescue of an Ops TeamDocker to the Rescue of an Ops Team
Docker to the Rescue of an Ops Team
 
Docker to the Rescue of an Ops Team
Docker to the Rescue of an Ops TeamDocker to the Rescue of an Ops Team
Docker to the Rescue of an Ops Team
 
BKK16-400B ODPI - Standardizing Hadoop
BKK16-400B ODPI - Standardizing HadoopBKK16-400B ODPI - Standardizing Hadoop
BKK16-400B ODPI - Standardizing Hadoop
 
ODPi (Open Data Platform Initiative) - Linaro Connect
ODPi (Open Data Platform Initiative) - Linaro ConnectODPi (Open Data Platform Initiative) - Linaro Connect
ODPi (Open Data Platform Initiative) - Linaro Connect
 

Mehr von Ganesh Raju

Technology trends, disruptions and Opportunities
Technology trends, disruptions and OpportunitiesTechnology trends, disruptions and Opportunities
Technology trends, disruptions and OpportunitiesGanesh Raju
 
ODPi (Open Data Platform Initiative) - Standardizing Hadoop Ecosystem: Linaro...
ODPi (Open Data Platform Initiative) - Standardizing Hadoop Ecosystem: Linaro...ODPi (Open Data Platform Initiative) - Standardizing Hadoop Ecosystem: Linaro...
ODPi (Open Data Platform Initiative) - Standardizing Hadoop Ecosystem: Linaro...Ganesh Raju
 
Apache Ambari on ARM Server - Linaro Connect
Apache Ambari on ARM Server - Linaro ConnectApache Ambari on ARM Server - Linaro Connect
Apache Ambari on ARM Server - Linaro ConnectGanesh Raju
 
Exploring Github Data with Apache Drill on ARM64
Exploring Github Data with Apache Drill on ARM64 Exploring Github Data with Apache Drill on ARM64
Exploring Github Data with Apache Drill on ARM64 Ganesh Raju
 
Apache Bigtop and ARM64 / AArch64 - Empowering Big Data Everywhere
Apache Bigtop and ARM64 / AArch64 - Empowering Big Data EverywhereApache Bigtop and ARM64 / AArch64 - Empowering Big Data Everywhere
Apache Bigtop and ARM64 / AArch64 - Empowering Big Data EverywhereGanesh Raju
 
Data Analytics and Machine Learning: From Node to Cluster on ARM64
Data Analytics and Machine Learning: From Node to Cluster on ARM64Data Analytics and Machine Learning: From Node to Cluster on ARM64
Data Analytics and Machine Learning: From Node to Cluster on ARM64Ganesh Raju
 
Smart City Big Data Visualization on 96Boards - Linaro Connect Las Vegas 2016
Smart City Big Data Visualization on 96Boards - Linaro Connect Las Vegas 2016Smart City Big Data Visualization on 96Boards - Linaro Connect Las Vegas 2016
Smart City Big Data Visualization on 96Boards - Linaro Connect Las Vegas 2016Ganesh Raju
 
Technology Trends, Disruptions and Opportunities
Technology Trends, Disruptions and OpportunitiesTechnology Trends, Disruptions and Opportunities
Technology Trends, Disruptions and OpportunitiesGanesh Raju
 
Certificate_DataStax_Cassandra
Certificate_DataStax_CassandraCertificate_DataStax_Cassandra
Certificate_DataStax_CassandraGanesh Raju
 

Mehr von Ganesh Raju (9)

Technology trends, disruptions and Opportunities
Technology trends, disruptions and OpportunitiesTechnology trends, disruptions and Opportunities
Technology trends, disruptions and Opportunities
 
ODPi (Open Data Platform Initiative) - Standardizing Hadoop Ecosystem: Linaro...
ODPi (Open Data Platform Initiative) - Standardizing Hadoop Ecosystem: Linaro...ODPi (Open Data Platform Initiative) - Standardizing Hadoop Ecosystem: Linaro...
ODPi (Open Data Platform Initiative) - Standardizing Hadoop Ecosystem: Linaro...
 
Apache Ambari on ARM Server - Linaro Connect
Apache Ambari on ARM Server - Linaro ConnectApache Ambari on ARM Server - Linaro Connect
Apache Ambari on ARM Server - Linaro Connect
 
Exploring Github Data with Apache Drill on ARM64
Exploring Github Data with Apache Drill on ARM64 Exploring Github Data with Apache Drill on ARM64
Exploring Github Data with Apache Drill on ARM64
 
Apache Bigtop and ARM64 / AArch64 - Empowering Big Data Everywhere
Apache Bigtop and ARM64 / AArch64 - Empowering Big Data EverywhereApache Bigtop and ARM64 / AArch64 - Empowering Big Data Everywhere
Apache Bigtop and ARM64 / AArch64 - Empowering Big Data Everywhere
 
Data Analytics and Machine Learning: From Node to Cluster on ARM64
Data Analytics and Machine Learning: From Node to Cluster on ARM64Data Analytics and Machine Learning: From Node to Cluster on ARM64
Data Analytics and Machine Learning: From Node to Cluster on ARM64
 
Smart City Big Data Visualization on 96Boards - Linaro Connect Las Vegas 2016
Smart City Big Data Visualization on 96Boards - Linaro Connect Las Vegas 2016Smart City Big Data Visualization on 96Boards - Linaro Connect Las Vegas 2016
Smart City Big Data Visualization on 96Boards - Linaro Connect Las Vegas 2016
 
Technology Trends, Disruptions and Opportunities
Technology Trends, Disruptions and OpportunitiesTechnology Trends, Disruptions and Opportunities
Technology Trends, Disruptions and Opportunities
 
Certificate_DataStax_Cassandra
Certificate_DataStax_CassandraCertificate_DataStax_Cassandra
Certificate_DataStax_Cassandra
 

Kürzlich hochgeladen

Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 

Kürzlich hochgeladen (20)

Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 

State of Big Data on ARM64 / AArch64 - Apache Bigtop

  • 1. HKG18-220: State of Big Data on AArch64 - Apache BigTop Ganesh Raju, Jun He & Naresh Bhat Big Data Team, LEG
  • 2. Agenda ● Why and What is Bigtop ● Bigtop patches for AArch64 ● Challenges of porting ● Walk through of setup and installation process ● Demo on Provisioning and Smoke Tests
  • 3. Why Apache Bigtop? ● Hadoop is a collection of many components ○ Numerous versions (Dependency hell) ○ Lots of patches ○ No stable development environment with certified binaries ○ No proper integrated tests ● With Bigtop - Build, Deploy in cluster with puppet, Configure, Install and Test ● Juju orchestration ● Blueprints ● Seamless integration into CI
  • 4. What is in Apache Bigtop? ● Output ○ A set of binaries(deb and rpm) just like HDP, ODPi, CDH, etc ○ Docker images ○ Docker sandbox images ● Integration code, Packaging code, Deployment code, Orchestration code ● Validation code ○ Integration tests ■ Clean slate provisioning ■ Dependency integration artifacts ■ Versioned test artifacts ■ Plug and play artifacts ■ JVM-based artifacts ○ Packaging tests ○ Smoke tests ● Continuous Integration
  • 5. Consumers of Bigtop Some of consumers of Bigtop ● ODPi ● Hortonworks ● Amazon ● Canonical ● EMC ● Pivotal ● Infosys ● Capgemini ● Ebay ● Intel ● TrendMicro ● WANdisco
  • 6. Bigtop v1.2 A few highlights of the v1.2 series release include: ● 6 Distros, 2 archs (x86, and ppc64le) supported ○ ARM support with v1.3 ● A newly introduced Bigtop Sandbox feature ● A faster Docker Provisioner which is rewritten to fully embrace Docker ecosystem ● OpenJDK 8 support ● Hadoop 2.7.3, Spark 2.1.1, HBase 1.1.9, and Zeppelin 0.72 are used ● And many upgrades of the ecosystem projects (Apex, Crunch, Flume, Ignite, Mahout, Oozie, Phoenix, and many others)
  • 7. Components of Bigtop alluxio v1.0.1 Greenplum gpdb v5.0.0-alpha.0 Apache pig v0.15.0 Apache ambari v2.5.0 Apache hadoop v2.7.3 Quantcast qfs v1.1.4 Apache apex v3.5.0 Apache hama v0.7.0 Apache solr v4.10.4 groovy v2.4.10 Apache hbase v1.1.3 Apache spark 1.1 v1.6.2 Apache commons - jsvc v1.0.15 Apache hive v1.2.1 Apache spark 2.0 v2.1.1 Apache tomcat v6.0.45 Apache hue v3.11.0 Apache sqoop v1 v1.4.6 bigtop_utils v1.2.0 Apache ignite v1.9.0 Apache sqoop v2 v1.99.4 Apache crunch v0.14.0 Apache kafka v0.10.1.1 Apache tajo v0.11.1 Pig UDF datafu v1.3.0 kite v1.1.0 Apache tez v0.6.2 Apache flink v1.1.3 Apache mahout v0.12.2 ycsb v0.4.0 Apache flume v1.7.0 Apache oozie v4.3.0 Apache zeppelin v0.7.0 Apache giraph v1.1.0 Apache phoenix v4.9.0-HBase-1.1 Apache zookeeper v3.4.6
  • 8. Contributions from ARM Ecosystem ● AArch64 CI nodes are running on Linaro DevCloud ○ 3 distros are supported: Debian-9, Fedora-26, Ubuntu-16.04 https://ci.bigtop.apache.org/job/Bigtop-trunk-packages/
  • 9. Contributions from ARM Ecosystem ● Patches to enable components on AArch64 ○ Build ■ Hadoop, Solr, Hbase, Ignite, … ○ Package ■ Hama, Solr, Oozie, Hue, … ○ Deploy ■ Service scripts, Automation scripts, Dockerfiles,… ○ Test ■ SmokeTests and Provisioner settings
  • 10. Contributions from ARM Ecosystem ● Lessons learned ○ Dependency issues ■ Native binaries: protobuf, phantomjs, … ■ Jars: levedb-jni, ignite-shmem, jffi, … ■ Version mismatch: slf4j, log4j, log4j2, … ○ Repos ■ Official release did not support aarch64 ● Had to create private/local repo
  • 11. Challenges ● Cyclic references take a lot of effort to fix ● Though most big data companies all use Bigtop, there has not been contributions coming in from them ● With founders moving out end of last year, and lead committers changing focus, the project has lost momentum
  • 12. Roadmap ● Make Bigtop a 1st class citizen on Kubernetes ● Test out Puppet deployment code in variety of different scenarios including developers spinning up test clusters via Docker deployer ● Improve Docker deployer to be more developer friendly and hook it back into Gradle ● Provide predefined sample stacks for specific use cases. For example: ○ Machine Learning: Hadoop+Spark+Zeppelin ○ Streaming: Hadoop+Kafka+Flink ● Create more tests ● Enable Ambari to install Bigtop stack. Utilize the work done for ODPi
  • 13. Sandbox ● What is Sandbox? A tool to build and run big data pseudo cluster using Docker ● Command to generate sandbox image $ ./build.sh -a bigtop -o centos-7 -c "hdfs, yarn, spark, ignite" ● You can do a dry run using option “--dryrun” command ● How to run the sandbox image $ docker run -d -p 50070:50070 bigtop/sandbox:centos-7_hdfs
  • 14. Smoke Tests ● Uses yaml file to configure located under <BIGTOP_SRC_TOP>/provisioner/docker/ ● Components to configure ○ docker image ○ distro type ○ components to install ○ components to test ○ JDK ● Environment check $ ./docker-hadoop.sh -E ● Execution $ cd provisioner/docker $ ./docker-hadoop.sh -C <smoke_test_cfg_yaml> -c <node_count> -s -d
  • 15. Glossary ● Linaro Collaborate page ● Bigtop wiki page ● Smoke Test Collaborate page ● Smoke Test Results
  • 16. Thank You #HKG18 HKG18 keynotes and videos on: connect.linaro.org For further information: www.linaro.org