JavaOne Conference 2016, San Francisco: Talk by Johannes Weigend (@johannesweigend, CTO at QAware).
Abstract: Large enterprise applications have a specific set of challenges that can be solved with the right combination of tools and techniques. This session provides overviews and demos of problems and solutions relating to big data and how to do large-scale intelligent image processing in Java with assistance from the NetBeans IDE. Among many other things, you will learn what big data means, how to work with Hadoop, what image recognition libraries are available, how they integrate with big data tools, and what the general workflow is in this context.
1. | Java One 2016 | UGF6436 | BigData with Free and Open Source Tools | Johannes Weigend1
2. | Java One 2016 | UGF6436 | BigData with Free and Open Source Tools | Johannes Weigend
BigData with Free and Open Source Tools
2
NetBeans IDE for BigData Development with Apache Spark
Johannes Weigend - QAware GmbH
3. | Java One 2016 | UGF6436 | BigData with Free and Open Source Tools | Johannes Weigend
About this Talk
A brief overview about BigData Processing (10 Minutes)
Live Demo: Apache Zeppelin and Spark (5 Minutes)
Spark Programming with NetBeans (10 Minutes)
3
4. | Java One 2016 | UGF6436 | BigData with Free and Open Source Tools | Johannes Weigend
Horizontal Scalability is Difficult!
■ Horizontal scalability of functions
■ Trivial
■ Loadbalancing of (stateless) services (makro- / microservices)
■ More users ! more machines
■ Non trivial
■ More machines ! faster response times
■ Horizontal scalability of data
■ Trivial
■ Linear distribution of data on multiple machines
■ More machines ! more data
■ Non trivial
■ Constant response times with growing datasets
4
5. | Java One 2016 | UGF6436 | BigData with Free and Open Source Tools | Johannes Weigend
Hadoop Gives Answers to Horizontal Scalability of
Data and Functions
5
6. | Java One 2016 | UGF6436 | BigData with Free and Open Source Tools | Johannes Weigend
7. | Java One 2016 | UGF6436 | BigData with Free and Open Source Tools | Johannes Weigend
■ Distributed computing (100x faster than Hadoop (M/R)
■ Distributed Map/Reduce on distributed data can be done in-memory
■ Written in Scala (JVM)
■ Java/Scala/Python APIs
■ Processes data from distributed and non-distributed sources
■Textfiles (accessible from all nodes)
■Hadoop File System (HDFS)
■Databases (JDBC)
■Solr per Lucidworks API
■...
READ THIS: https://www.cs.berkeley.edu/~matei/papers/2012/nsdi_spark.pdf
9. | Java One 2016 | UGF6436 | BigData with Free and Open Source Tools | Johannes Weigend
Apache Spark - Lambda on Steroids
9
10. | Java One 2016 | UGF6436 | BigData with Free and Open Source Tools | Johannes Weigend10
„Put the Cloud in a Box“
11. | Java One 2016 | UGF6436 | BigData with Free and Open Source Tools | Johannes Weigend
Cloud Case – 5x Intel NUC6i5SYK
6th generation Intel® Core™ i5-6260U processor
with Intel® Iris™ graphics
(1.9 GHz up to 2.8 GHz Turbo, Dual Core, 4 MB
Cache, 15W TDP)
CPU
32 GB Dual-channel DDR4 SODIMMs
1.2V, 2133 MHz
RAM
256 GB Samsung M.2 internal SSDDISK
! This case is as powerful as five notebooks
10 Cores, 20 HT Units, 160 GB RAM, 1,25 TB DiskTotal
12. | Java One 2016 | UGF6436 | BigData with Free and Open Source Tools | Johannes Weigend
LogFile Analysis with Apache Spark and NetBeans
■DEMO
- Getting Started with Spark Programming in NetBeans
- Working with Gradle projects and code completion
- Using a real cluster (The cloud case)
- Working with the remote terminal
- Using the embedded browser
- Using Docker
- Connect to a remote Docker Engine
- Using container logs
12
13. | Java One 2016 | UGF6436 | BigData with Free and Open Source Tools | Johannes Weigend
14. | Java One 2016 | UGF6436 | BigData with Free and Open Source Tools | Johannes Weigend
Spark Pattern 1: Distributed Task with Params
14
15. | Java One 2016 | UGF6436 | BigData with Free and Open Source Tools | Johannes Weigend
Spark Pattern 2: Distributed Read from External Sources
15
16. | Java One 2016 | UGF6436 | BigData with Free and Open Source Tools | Johannes Weigend
Spark Pattern 3: Caching and Further Processing with RDDs
16