SlideShare ist ein Scribd-Unternehmen logo
1 von 16
Downloaden Sie, um offline zu lesen
| Java One 2016 | UGF6436 | BigData with Free and Open Source Tools | Johannes Weigend1
| Java One 2016 | UGF6436 | BigData with Free and Open Source Tools | Johannes Weigend
BigData with Free and Open Source Tools
2
NetBeans IDE for BigData Development with Apache Spark
Johannes Weigend - QAware GmbH
| Java One 2016 | UGF6436 | BigData with Free and Open Source Tools | Johannes Weigend
About this Talk
A brief overview about BigData Processing (10 Minutes)
Live Demo: Apache Zeppelin and Spark (5 Minutes)
Spark Programming with NetBeans (10 Minutes)
3
| Java One 2016 | UGF6436 | BigData with Free and Open Source Tools | Johannes Weigend
Horizontal Scalability is Difficult!
■ Horizontal scalability of functions

■ Trivial
■ Loadbalancing of (stateless) services (makro- / microservices)
■ More users ! more machines
■ Non trivial
■ More machines ! faster response times
■ Horizontal scalability of data

■ Trivial

■ Linear distribution of data on multiple machines
■ More machines ! more data
■ Non trivial
■ Constant response times with growing datasets
4
| Java One 2016 | UGF6436 | BigData with Free and Open Source Tools | Johannes Weigend
Hadoop Gives Answers to Horizontal Scalability of
Data and Functions
5
| Java One 2016 | UGF6436 | BigData with Free and Open Source Tools | Johannes Weigend
| Java One 2016 | UGF6436 | BigData with Free and Open Source Tools | Johannes Weigend
■ Distributed computing (100x faster than Hadoop (M/R)

■ Distributed Map/Reduce on distributed data can be done in-memory 

■ Written in Scala (JVM)

■ Java/Scala/Python APIs

■ Processes data from distributed and non-distributed sources

■Textfiles (accessible from all nodes)

■Hadoop File System (HDFS)

■Databases (JDBC)

■Solr per Lucidworks API

■...
READ THIS: https://www.cs.berkeley.edu/~matei/papers/2012/nsdi_spark.pdf
| Java One 2016 | UGF6436 | BigData with Free and Open Source Tools | Johannes Weigend
Cluster
JVM
Worker
Worker
JVM
JVM
JVM
Worker
Master / Yarn / Mesos
JVM
Executor
Executor
JVM
JVM
JVM
Executor
start
start
start
Task
Task(s)
Slave
Slave
Slave
Master
Host
Spark
Context
MasterURL
Resilient
Distributed
Dataset
RDD
Driver Node
creates
Driver Application
Application
uses
Partition
Task(s)
Partition
Task(s)
Partition
| Java One 2016 | UGF6436 | BigData with Free and Open Source Tools | Johannes Weigend
Apache Spark - Lambda on Steroids
9
| Java One 2016 | UGF6436 | BigData with Free and Open Source Tools | Johannes Weigend10
„Put the Cloud in a Box“
| Java One 2016 | UGF6436 | BigData with Free and Open Source Tools | Johannes Weigend
Cloud Case – 5x Intel NUC6i5SYK

6th generation Intel® Core™ i5-6260U processor
with Intel® Iris™ graphics
(1.9 GHz up to 2.8 GHz Turbo, Dual Core, 4 MB
Cache, 15W TDP)
CPU
32 GB Dual-channel DDR4 SODIMMs
1.2V, 2133 MHz
RAM
256 GB Samsung M.2 internal SSDDISK
! This case is as powerful as five notebooks
10 Cores, 20 HT Units, 160 GB RAM, 1,25 TB DiskTotal
| Java One 2016 | UGF6436 | BigData with Free and Open Source Tools | Johannes Weigend
LogFile Analysis with Apache Spark and NetBeans
■DEMO

- Getting Started with Spark Programming in NetBeans

- Working with Gradle projects and code completion

- Using a real cluster (The cloud case)

- Working with the remote terminal

- Using the embedded browser

- Using Docker

- Connect to a remote Docker Engine

- Using container logs
12
| Java One 2016 | UGF6436 | BigData with Free and Open Source Tools | Johannes Weigend
| Java One 2016 | UGF6436 | BigData with Free and Open Source Tools | Johannes Weigend
Spark Pattern 1: Distributed Task with Params
14
| Java One 2016 | UGF6436 | BigData with Free and Open Source Tools | Johannes Weigend
Spark Pattern 2: Distributed Read from External Sources
15
| Java One 2016 | UGF6436 | BigData with Free and Open Source Tools | Johannes Weigend
Spark Pattern 3: Caching and Further Processing with RDDs
16

Weitere ähnliche Inhalte

Andere mochten auch

Per Anhalter durch den Cloud Native Stack (extended edition)
Per Anhalter durch den Cloud Native Stack (extended edition)Per Anhalter durch den Cloud Native Stack (extended edition)
Per Anhalter durch den Cloud Native Stack (extended edition)QAware GmbH
 
Automotive Information Research driven by Apache Solr
Automotive Information Research driven by Apache SolrAutomotive Information Research driven by Apache Solr
Automotive Information Research driven by Apache SolrQAware GmbH
 
Vamp - The anti-fragilitiy platform for digital services
Vamp - The anti-fragilitiy platform for digital servicesVamp - The anti-fragilitiy platform for digital services
Vamp - The anti-fragilitiy platform for digital servicesQAware GmbH
 
Azure Functions - Get rid of your servers, use functions!
Azure Functions - Get rid of your servers, use functions!Azure Functions - Get rid of your servers, use functions!
Azure Functions - Get rid of your servers, use functions!QAware GmbH
 
A Hitchhiker's Guide to the Cloud Native Stack
A Hitchhiker's Guide to the Cloud Native StackA Hitchhiker's Guide to the Cloud Native Stack
A Hitchhiker's Guide to the Cloud Native StackQAware GmbH
 
Developing Skills for Amazon Echo
Developing Skills for Amazon EchoDeveloping Skills for Amazon Echo
Developing Skills for Amazon EchoQAware GmbH
 
Chronix as Long-Term Storage for Prometheus
Chronix as Long-Term Storage for PrometheusChronix as Long-Term Storage for Prometheus
Chronix as Long-Term Storage for PrometheusQAware GmbH
 
Everything-as-code. Polyglotte Software-Entwicklung in der Praxis.
Everything-as-code. Polyglotte Software-Entwicklung in der Praxis.Everything-as-code. Polyglotte Software-Entwicklung in der Praxis.
Everything-as-code. Polyglotte Software-Entwicklung in der Praxis.QAware GmbH
 
Scala and Spark are Ideal for Big Data
Scala and Spark are Ideal for Big DataScala and Spark are Ideal for Big Data
Scala and Spark are Ideal for Big DataJohn Nestor
 
Real World Analytics
Real World AnalyticsReal World Analytics
Real World AnalyticsNorbert Boros
 
Hands-on K8s: Deployments, Pods and Fun
Hands-on K8s: Deployments, Pods and FunHands-on K8s: Deployments, Pods and Fun
Hands-on K8s: Deployments, Pods and FunQAware GmbH
 
Kubernetes 101 and Fun
Kubernetes 101 and FunKubernetes 101 and Fun
Kubernetes 101 and FunQAware GmbH
 
Cloud Native Unleashed
Cloud Native UnleashedCloud Native Unleashed
Cloud Native UnleashedQAware GmbH
 
Everything as-code. Polyglotte Entwicklung in der Praxis. #oop2017
Everything as-code. Polyglotte Entwicklung in der Praxis. #oop2017Everything as-code. Polyglotte Entwicklung in der Praxis. #oop2017
Everything as-code. Polyglotte Entwicklung in der Praxis. #oop2017Mario-Leander Reimer
 
Die Leichtigkeit des Seins: Bindings für Eclipse SmartHome entwickeln
Die Leichtigkeit des Seins: Bindings für Eclipse SmartHome entwickelnDie Leichtigkeit des Seins: Bindings für Eclipse SmartHome entwickeln
Die Leichtigkeit des Seins: Bindings für Eclipse SmartHome entwickelnQAware GmbH
 
Clickstream Analysis with Spark - Understanding Visitors in Real Time
Clickstream Analysis with Spark - Understanding Visitors in Real TimeClickstream Analysis with Spark - Understanding Visitors in Real Time
Clickstream Analysis with Spark - Understanding Visitors in Real TimeQAware GmbH
 
From pets to cattle - powered by CoreOS, docker, Mesos & nginx
From pets to cattle - powered by CoreOS, docker, Mesos & nginxFrom pets to cattle - powered by CoreOS, docker, Mesos & nginx
From pets to cattle - powered by CoreOS, docker, Mesos & nginxQAware GmbH
 
Everything-as-code - a polyglot journey.
Everything-as-code - a polyglot journey.Everything-as-code - a polyglot journey.
Everything-as-code - a polyglot journey.QAware GmbH
 
Clickstream Analysis with Apache Spark
Clickstream Analysis with Apache SparkClickstream Analysis with Apache Spark
Clickstream Analysis with Apache SparkQAware GmbH
 
Chronix Time Series Database - The New Time Series Kid on the Block
Chronix Time Series Database - The New Time Series Kid on the BlockChronix Time Series Database - The New Time Series Kid on the Block
Chronix Time Series Database - The New Time Series Kid on the BlockQAware GmbH
 

Andere mochten auch (20)

Per Anhalter durch den Cloud Native Stack (extended edition)
Per Anhalter durch den Cloud Native Stack (extended edition)Per Anhalter durch den Cloud Native Stack (extended edition)
Per Anhalter durch den Cloud Native Stack (extended edition)
 
Automotive Information Research driven by Apache Solr
Automotive Information Research driven by Apache SolrAutomotive Information Research driven by Apache Solr
Automotive Information Research driven by Apache Solr
 
Vamp - The anti-fragilitiy platform for digital services
Vamp - The anti-fragilitiy platform for digital servicesVamp - The anti-fragilitiy platform for digital services
Vamp - The anti-fragilitiy platform for digital services
 
Azure Functions - Get rid of your servers, use functions!
Azure Functions - Get rid of your servers, use functions!Azure Functions - Get rid of your servers, use functions!
Azure Functions - Get rid of your servers, use functions!
 
A Hitchhiker's Guide to the Cloud Native Stack
A Hitchhiker's Guide to the Cloud Native StackA Hitchhiker's Guide to the Cloud Native Stack
A Hitchhiker's Guide to the Cloud Native Stack
 
Developing Skills for Amazon Echo
Developing Skills for Amazon EchoDeveloping Skills for Amazon Echo
Developing Skills for Amazon Echo
 
Chronix as Long-Term Storage for Prometheus
Chronix as Long-Term Storage for PrometheusChronix as Long-Term Storage for Prometheus
Chronix as Long-Term Storage for Prometheus
 
Everything-as-code. Polyglotte Software-Entwicklung in der Praxis.
Everything-as-code. Polyglotte Software-Entwicklung in der Praxis.Everything-as-code. Polyglotte Software-Entwicklung in der Praxis.
Everything-as-code. Polyglotte Software-Entwicklung in der Praxis.
 
Scala and Spark are Ideal for Big Data
Scala and Spark are Ideal for Big DataScala and Spark are Ideal for Big Data
Scala and Spark are Ideal for Big Data
 
Real World Analytics
Real World AnalyticsReal World Analytics
Real World Analytics
 
Hands-on K8s: Deployments, Pods and Fun
Hands-on K8s: Deployments, Pods and FunHands-on K8s: Deployments, Pods and Fun
Hands-on K8s: Deployments, Pods and Fun
 
Kubernetes 101 and Fun
Kubernetes 101 and FunKubernetes 101 and Fun
Kubernetes 101 and Fun
 
Cloud Native Unleashed
Cloud Native UnleashedCloud Native Unleashed
Cloud Native Unleashed
 
Everything as-code. Polyglotte Entwicklung in der Praxis. #oop2017
Everything as-code. Polyglotte Entwicklung in der Praxis. #oop2017Everything as-code. Polyglotte Entwicklung in der Praxis. #oop2017
Everything as-code. Polyglotte Entwicklung in der Praxis. #oop2017
 
Die Leichtigkeit des Seins: Bindings für Eclipse SmartHome entwickeln
Die Leichtigkeit des Seins: Bindings für Eclipse SmartHome entwickelnDie Leichtigkeit des Seins: Bindings für Eclipse SmartHome entwickeln
Die Leichtigkeit des Seins: Bindings für Eclipse SmartHome entwickeln
 
Clickstream Analysis with Spark - Understanding Visitors in Real Time
Clickstream Analysis with Spark - Understanding Visitors in Real TimeClickstream Analysis with Spark - Understanding Visitors in Real Time
Clickstream Analysis with Spark - Understanding Visitors in Real Time
 
From pets to cattle - powered by CoreOS, docker, Mesos & nginx
From pets to cattle - powered by CoreOS, docker, Mesos & nginxFrom pets to cattle - powered by CoreOS, docker, Mesos & nginx
From pets to cattle - powered by CoreOS, docker, Mesos & nginx
 
Everything-as-code - a polyglot journey.
Everything-as-code - a polyglot journey.Everything-as-code - a polyglot journey.
Everything-as-code - a polyglot journey.
 
Clickstream Analysis with Apache Spark
Clickstream Analysis with Apache SparkClickstream Analysis with Apache Spark
Clickstream Analysis with Apache Spark
 
Chronix Time Series Database - The New Time Series Kid on the Block
Chronix Time Series Database - The New Time Series Kid on the BlockChronix Time Series Database - The New Time Series Kid on the Block
Chronix Time Series Database - The New Time Series Kid on the Block
 

Mehr von QAware GmbH

50 Shades of K8s Autoscaling #JavaLand24.pdf
50 Shades of K8s Autoscaling #JavaLand24.pdf50 Shades of K8s Autoscaling #JavaLand24.pdf
50 Shades of K8s Autoscaling #JavaLand24.pdfQAware GmbH
 
Make Agile Great - PM-Erfahrungen aus zwei virtuellen internationalen SAFe-Pr...
Make Agile Great - PM-Erfahrungen aus zwei virtuellen internationalen SAFe-Pr...Make Agile Great - PM-Erfahrungen aus zwei virtuellen internationalen SAFe-Pr...
Make Agile Great - PM-Erfahrungen aus zwei virtuellen internationalen SAFe-Pr...QAware GmbH
 
Fully-managed Cloud-native Databases: The path to indefinite scale @ CNN Mainz
Fully-managed Cloud-native Databases: The path to indefinite scale @ CNN MainzFully-managed Cloud-native Databases: The path to indefinite scale @ CNN Mainz
Fully-managed Cloud-native Databases: The path to indefinite scale @ CNN MainzQAware GmbH
 
Down the Ivory Tower towards Agile Architecture
Down the Ivory Tower towards Agile ArchitectureDown the Ivory Tower towards Agile Architecture
Down the Ivory Tower towards Agile ArchitectureQAware GmbH
 
"Mixed" Scrum-Teams – Die richtige Mischung macht's!
"Mixed" Scrum-Teams – Die richtige Mischung macht's!"Mixed" Scrum-Teams – Die richtige Mischung macht's!
"Mixed" Scrum-Teams – Die richtige Mischung macht's!QAware GmbH
 
Make Developers Fly: Principles for Platform Engineering
Make Developers Fly: Principles for Platform EngineeringMake Developers Fly: Principles for Platform Engineering
Make Developers Fly: Principles for Platform EngineeringQAware GmbH
 
Der Tod der Testpyramide? – Frontend-Testing mit Playwright
Der Tod der Testpyramide? – Frontend-Testing mit PlaywrightDer Tod der Testpyramide? – Frontend-Testing mit Playwright
Der Tod der Testpyramide? – Frontend-Testing mit PlaywrightQAware GmbH
 
Was kommt nach den SPAs
Was kommt nach den SPAsWas kommt nach den SPAs
Was kommt nach den SPAsQAware GmbH
 
Cloud Migration mit KI: der Turbo
Cloud Migration mit KI: der Turbo Cloud Migration mit KI: der Turbo
Cloud Migration mit KI: der Turbo QAware GmbH
 
Migration von stark regulierten Anwendungen in die Cloud: Dem Teufel die See...
 Migration von stark regulierten Anwendungen in die Cloud: Dem Teufel die See... Migration von stark regulierten Anwendungen in die Cloud: Dem Teufel die See...
Migration von stark regulierten Anwendungen in die Cloud: Dem Teufel die See...QAware GmbH
 
Aus blau wird grün! Ansätze und Technologien für nachhaltige Kubernetes-Cluster
Aus blau wird grün! Ansätze und Technologien für nachhaltige Kubernetes-Cluster Aus blau wird grün! Ansätze und Technologien für nachhaltige Kubernetes-Cluster
Aus blau wird grün! Ansätze und Technologien für nachhaltige Kubernetes-Cluster QAware GmbH
 
Endlich gute API Tests. Boldly Testing APIs Where No One Has Tested Before.
Endlich gute API Tests. Boldly Testing APIs Where No One Has Tested Before.Endlich gute API Tests. Boldly Testing APIs Where No One Has Tested Before.
Endlich gute API Tests. Boldly Testing APIs Where No One Has Tested Before.QAware GmbH
 
Kubernetes with Cilium in AWS - Experience Report!
Kubernetes with Cilium in AWS - Experience Report!Kubernetes with Cilium in AWS - Experience Report!
Kubernetes with Cilium in AWS - Experience Report!QAware GmbH
 
50 Shades of K8s Autoscaling
50 Shades of K8s Autoscaling50 Shades of K8s Autoscaling
50 Shades of K8s AutoscalingQAware GmbH
 
Kontinuierliche Sicherheitstests für APIs mit Testkube und OWASP ZAP
Kontinuierliche Sicherheitstests für APIs mit Testkube und OWASP ZAPKontinuierliche Sicherheitstests für APIs mit Testkube und OWASP ZAP
Kontinuierliche Sicherheitstests für APIs mit Testkube und OWASP ZAPQAware GmbH
 
Service Mesh Pain & Gain. Experiences from a client project.
Service Mesh Pain & Gain. Experiences from a client project.Service Mesh Pain & Gain. Experiences from a client project.
Service Mesh Pain & Gain. Experiences from a client project.QAware GmbH
 
50 Shades of K8s Autoscaling
50 Shades of K8s Autoscaling50 Shades of K8s Autoscaling
50 Shades of K8s AutoscalingQAware GmbH
 
Blue turns green! Approaches and technologies for sustainable K8s clusters.
Blue turns green! Approaches and technologies for sustainable K8s clusters.Blue turns green! Approaches and technologies for sustainable K8s clusters.
Blue turns green! Approaches and technologies for sustainable K8s clusters.QAware GmbH
 
Per Anhalter zu Cloud Nativen API Gateways
Per Anhalter zu Cloud Nativen API GatewaysPer Anhalter zu Cloud Nativen API Gateways
Per Anhalter zu Cloud Nativen API GatewaysQAware GmbH
 
Aus blau wird grün! Ansätze und Technologien für nachhaltige Kubernetes-Cluster
Aus blau wird grün! Ansätze und Technologien für nachhaltige Kubernetes-Cluster Aus blau wird grün! Ansätze und Technologien für nachhaltige Kubernetes-Cluster
Aus blau wird grün! Ansätze und Technologien für nachhaltige Kubernetes-Cluster QAware GmbH
 

Mehr von QAware GmbH (20)

50 Shades of K8s Autoscaling #JavaLand24.pdf
50 Shades of K8s Autoscaling #JavaLand24.pdf50 Shades of K8s Autoscaling #JavaLand24.pdf
50 Shades of K8s Autoscaling #JavaLand24.pdf
 
Make Agile Great - PM-Erfahrungen aus zwei virtuellen internationalen SAFe-Pr...
Make Agile Great - PM-Erfahrungen aus zwei virtuellen internationalen SAFe-Pr...Make Agile Great - PM-Erfahrungen aus zwei virtuellen internationalen SAFe-Pr...
Make Agile Great - PM-Erfahrungen aus zwei virtuellen internationalen SAFe-Pr...
 
Fully-managed Cloud-native Databases: The path to indefinite scale @ CNN Mainz
Fully-managed Cloud-native Databases: The path to indefinite scale @ CNN MainzFully-managed Cloud-native Databases: The path to indefinite scale @ CNN Mainz
Fully-managed Cloud-native Databases: The path to indefinite scale @ CNN Mainz
 
Down the Ivory Tower towards Agile Architecture
Down the Ivory Tower towards Agile ArchitectureDown the Ivory Tower towards Agile Architecture
Down the Ivory Tower towards Agile Architecture
 
"Mixed" Scrum-Teams – Die richtige Mischung macht's!
"Mixed" Scrum-Teams – Die richtige Mischung macht's!"Mixed" Scrum-Teams – Die richtige Mischung macht's!
"Mixed" Scrum-Teams – Die richtige Mischung macht's!
 
Make Developers Fly: Principles for Platform Engineering
Make Developers Fly: Principles for Platform EngineeringMake Developers Fly: Principles for Platform Engineering
Make Developers Fly: Principles for Platform Engineering
 
Der Tod der Testpyramide? – Frontend-Testing mit Playwright
Der Tod der Testpyramide? – Frontend-Testing mit PlaywrightDer Tod der Testpyramide? – Frontend-Testing mit Playwright
Der Tod der Testpyramide? – Frontend-Testing mit Playwright
 
Was kommt nach den SPAs
Was kommt nach den SPAsWas kommt nach den SPAs
Was kommt nach den SPAs
 
Cloud Migration mit KI: der Turbo
Cloud Migration mit KI: der Turbo Cloud Migration mit KI: der Turbo
Cloud Migration mit KI: der Turbo
 
Migration von stark regulierten Anwendungen in die Cloud: Dem Teufel die See...
 Migration von stark regulierten Anwendungen in die Cloud: Dem Teufel die See... Migration von stark regulierten Anwendungen in die Cloud: Dem Teufel die See...
Migration von stark regulierten Anwendungen in die Cloud: Dem Teufel die See...
 
Aus blau wird grün! Ansätze und Technologien für nachhaltige Kubernetes-Cluster
Aus blau wird grün! Ansätze und Technologien für nachhaltige Kubernetes-Cluster Aus blau wird grün! Ansätze und Technologien für nachhaltige Kubernetes-Cluster
Aus blau wird grün! Ansätze und Technologien für nachhaltige Kubernetes-Cluster
 
Endlich gute API Tests. Boldly Testing APIs Where No One Has Tested Before.
Endlich gute API Tests. Boldly Testing APIs Where No One Has Tested Before.Endlich gute API Tests. Boldly Testing APIs Where No One Has Tested Before.
Endlich gute API Tests. Boldly Testing APIs Where No One Has Tested Before.
 
Kubernetes with Cilium in AWS - Experience Report!
Kubernetes with Cilium in AWS - Experience Report!Kubernetes with Cilium in AWS - Experience Report!
Kubernetes with Cilium in AWS - Experience Report!
 
50 Shades of K8s Autoscaling
50 Shades of K8s Autoscaling50 Shades of K8s Autoscaling
50 Shades of K8s Autoscaling
 
Kontinuierliche Sicherheitstests für APIs mit Testkube und OWASP ZAP
Kontinuierliche Sicherheitstests für APIs mit Testkube und OWASP ZAPKontinuierliche Sicherheitstests für APIs mit Testkube und OWASP ZAP
Kontinuierliche Sicherheitstests für APIs mit Testkube und OWASP ZAP
 
Service Mesh Pain & Gain. Experiences from a client project.
Service Mesh Pain & Gain. Experiences from a client project.Service Mesh Pain & Gain. Experiences from a client project.
Service Mesh Pain & Gain. Experiences from a client project.
 
50 Shades of K8s Autoscaling
50 Shades of K8s Autoscaling50 Shades of K8s Autoscaling
50 Shades of K8s Autoscaling
 
Blue turns green! Approaches and technologies for sustainable K8s clusters.
Blue turns green! Approaches and technologies for sustainable K8s clusters.Blue turns green! Approaches and technologies for sustainable K8s clusters.
Blue turns green! Approaches and technologies for sustainable K8s clusters.
 
Per Anhalter zu Cloud Nativen API Gateways
Per Anhalter zu Cloud Nativen API GatewaysPer Anhalter zu Cloud Nativen API Gateways
Per Anhalter zu Cloud Nativen API Gateways
 
Aus blau wird grün! Ansätze und Technologien für nachhaltige Kubernetes-Cluster
Aus blau wird grün! Ansätze und Technologien für nachhaltige Kubernetes-Cluster Aus blau wird grün! Ansätze und Technologien für nachhaltige Kubernetes-Cluster
Aus blau wird grün! Ansätze und Technologien für nachhaltige Kubernetes-Cluster
 

Kürzlich hochgeladen

Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Boston Institute of Analytics
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024Susanna-Assunta Sansone
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Thomas Poetter
 
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxThe Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxTasha Penwell
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBoston Institute of Analytics
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...Unveiling the Role of Social Media Suspect Investigators in Preventing Online...
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...Milind Agarwal
 
convolutional neural network and its applications.pdf
convolutional neural network and its applications.pdfconvolutional neural network and its applications.pdf
convolutional neural network and its applications.pdfSubhamKumar3239
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxaleedritatuxx
 
What To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxWhat To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxSimranPal17
 
Cyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataCyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataTecnoIncentive
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Boston Institute of Analytics
 
Decoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectDecoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectBoston Institute of Analytics
 
INTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processingINTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processingsocarem879
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksdeepakthakur548787
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxMike Bennett
 

Kürzlich hochgeladen (20)

Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
 
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxThe Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...Unveiling the Role of Social Media Suspect Investigators in Preventing Online...
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...
 
convolutional neural network and its applications.pdf
convolutional neural network and its applications.pdfconvolutional neural network and its applications.pdf
convolutional neural network and its applications.pdf
 
Data Analysis Project: Stroke Prediction
Data Analysis Project: Stroke PredictionData Analysis Project: Stroke Prediction
Data Analysis Project: Stroke Prediction
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
 
What To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxWhat To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptx
 
Cyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataCyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded data
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
 
Decoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectDecoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis Project
 
INTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processingINTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processing
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing works
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptx
 

NetBeans for Big Data

  • 1. | Java One 2016 | UGF6436 | BigData with Free and Open Source Tools | Johannes Weigend1
  • 2. | Java One 2016 | UGF6436 | BigData with Free and Open Source Tools | Johannes Weigend BigData with Free and Open Source Tools 2 NetBeans IDE for BigData Development with Apache Spark Johannes Weigend - QAware GmbH
  • 3. | Java One 2016 | UGF6436 | BigData with Free and Open Source Tools | Johannes Weigend About this Talk A brief overview about BigData Processing (10 Minutes) Live Demo: Apache Zeppelin and Spark (5 Minutes) Spark Programming with NetBeans (10 Minutes) 3
  • 4. | Java One 2016 | UGF6436 | BigData with Free and Open Source Tools | Johannes Weigend Horizontal Scalability is Difficult! ■ Horizontal scalability of functions ■ Trivial ■ Loadbalancing of (stateless) services (makro- / microservices) ■ More users ! more machines ■ Non trivial ■ More machines ! faster response times ■ Horizontal scalability of data ■ Trivial ■ Linear distribution of data on multiple machines ■ More machines ! more data ■ Non trivial ■ Constant response times with growing datasets 4
  • 5. | Java One 2016 | UGF6436 | BigData with Free and Open Source Tools | Johannes Weigend Hadoop Gives Answers to Horizontal Scalability of Data and Functions 5
  • 6. | Java One 2016 | UGF6436 | BigData with Free and Open Source Tools | Johannes Weigend
  • 7. | Java One 2016 | UGF6436 | BigData with Free and Open Source Tools | Johannes Weigend ■ Distributed computing (100x faster than Hadoop (M/R) ■ Distributed Map/Reduce on distributed data can be done in-memory ■ Written in Scala (JVM) ■ Java/Scala/Python APIs ■ Processes data from distributed and non-distributed sources ■Textfiles (accessible from all nodes) ■Hadoop File System (HDFS) ■Databases (JDBC) ■Solr per Lucidworks API ■... READ THIS: https://www.cs.berkeley.edu/~matei/papers/2012/nsdi_spark.pdf
  • 8. | Java One 2016 | UGF6436 | BigData with Free and Open Source Tools | Johannes Weigend Cluster JVM Worker Worker JVM JVM JVM Worker Master / Yarn / Mesos JVM Executor Executor JVM JVM JVM Executor start start start Task Task(s) Slave Slave Slave Master Host Spark Context MasterURL Resilient Distributed Dataset RDD Driver Node creates Driver Application Application uses Partition Task(s) Partition Task(s) Partition
  • 9. | Java One 2016 | UGF6436 | BigData with Free and Open Source Tools | Johannes Weigend Apache Spark - Lambda on Steroids 9
  • 10. | Java One 2016 | UGF6436 | BigData with Free and Open Source Tools | Johannes Weigend10 „Put the Cloud in a Box“
  • 11. | Java One 2016 | UGF6436 | BigData with Free and Open Source Tools | Johannes Weigend Cloud Case – 5x Intel NUC6i5SYK
 6th generation Intel® Core™ i5-6260U processor with Intel® Iris™ graphics (1.9 GHz up to 2.8 GHz Turbo, Dual Core, 4 MB Cache, 15W TDP) CPU 32 GB Dual-channel DDR4 SODIMMs 1.2V, 2133 MHz RAM 256 GB Samsung M.2 internal SSDDISK ! This case is as powerful as five notebooks 10 Cores, 20 HT Units, 160 GB RAM, 1,25 TB DiskTotal
  • 12. | Java One 2016 | UGF6436 | BigData with Free and Open Source Tools | Johannes Weigend LogFile Analysis with Apache Spark and NetBeans ■DEMO - Getting Started with Spark Programming in NetBeans - Working with Gradle projects and code completion - Using a real cluster (The cloud case) - Working with the remote terminal - Using the embedded browser - Using Docker - Connect to a remote Docker Engine - Using container logs 12
  • 13. | Java One 2016 | UGF6436 | BigData with Free and Open Source Tools | Johannes Weigend
  • 14. | Java One 2016 | UGF6436 | BigData with Free and Open Source Tools | Johannes Weigend Spark Pattern 1: Distributed Task with Params 14
  • 15. | Java One 2016 | UGF6436 | BigData with Free and Open Source Tools | Johannes Weigend Spark Pattern 2: Distributed Read from External Sources 15
  • 16. | Java One 2016 | UGF6436 | BigData with Free and Open Source Tools | Johannes Weigend Spark Pattern 3: Caching and Further Processing with RDDs 16