SlideShare ist ein Scribd-Unternehmen logo
1 von 56
Vademecum Big Data
Adam Kawa, Spotify, Compendium CE
About Me
Spotify/Compendium, WHUG/SHUG, HakunaMapData.com, +2.5Y
And The 20-Minute Story About ...




Image source:http://www.containsmoderateperil.com/wp-content/uploads/2012/09/Dev-Diary-Epic-Story.jpg
A Really Data-Driven Company …




Image source: http://wwwimg.roku.com/hero-images/home2_1.jpg
And Some Inevitable Problems ...




Image source: http://www.digitalnewsasia.com/sites/default/files/images/digital%20economy/data%20explosion.jpg
And Some Inevitable Problems ...




Image source: http://p.alejka.pl/i2/p_new/36/42/grosz-na-szczescie-ze-zlota-m-1z-doskonaly-na-kazda-okazje_0_b.jpg
And Some Inevitable Problems ...




Image source: http://25.media.tumblr.com/d1038e7831eae86f5e84d0d09a2e6fad/tumblr_mfh5srmNAR1s06a3to1_500.jpg
Start!
The First Approach Works Fine ...
Until Data Gets Bigger ...
And More Diverse ...
The Data Monster Becomes A Problem




Image source: http://cloudtimes.org/wp-content/uploads/2012/05/big-data.jpg
Apache Hadoop Becomes A Solution




Image source: http://gigaom2.files.wordpress.com/2012/06/shutterstock_60414424.jpg
Orchestra Of Nodes




Image source: http://www.dsn.jhu.edu/images/orchestra.gif
Fault-Tolerant Orchestra Of Nodes
Untypical Orchestra Of Typical* Nodes
* however having very cheap nodes is false economy
Highly Scalable Orchestra Of Nodes
Hadoop Distributed File System (HDFS)




Image source: http://www.wallcoo.net/car/Trucks/images/Big_Truck_on_Road_.jpg
HDFS Blocks And Replication
HDFS Self-Healing Features




Image source: http://www.mwctoys.com/images/review_hydra_3.jpg
HDFS Scales And Shines With MapReduce




Image source: http://www.kkkp.pl/graph/gr_kdz_char3.jpg
MapReduce Is A Change


                                            DATA
                                             Map And Reduce


Image source: http://2.bp.blogspot.com/-Kl1ADjd3_7I/T6a8ZQV7ITI/AAAAAAAAKfE/qVyTQdJl2Do/s1600/make-big-changes-in-small-steps.png
Map And Reduce Functions
MapReduce Paradigm
Artist Count Example
Sending Computation To Data


                                                                                                     Data
                                                                                                     Is
                                                                                                     Here!


Computation


Image source: http://www.conservationmagazine.org/wp-content/uploads/2011/03/ElephantAndMouse1.jpg
MapReduce Implementation




Image source: http://i3.mirror.co.uk/incoming/article1360046.ece/ALTERNATES/s615/Male+drones+tend+to+honeycomb+cells+in+a+bee+colony
First Success: 5-Node Hadoop Cluster




Image source: http://www.smallbiztechnology.com/wp-content/uploads/2012/12/success.jpg
Apache Whirr And The Cloud
===== hadoop.properties =============
whirr.cluster-name=production_cluster
whirr.instance-templates=
1 hadoop-jobtracker+hadoop-namenode,
4 hadoop-datanode+hadoop-tasktracker
whirr.provider=aws-ec2 # or Rackspace cloudservers-us
...
=====================================

$ whirr launch-cluster --config hadoop.properties
$ whirr destroy-cluster --config hadoop.properties
First Sad (Non-Java Speaking) Developers




Image source: http://www.shivayanaturals.com/wp-content/uploads/2012/01/Unhappy.jpg
Hadoop Streaming For Scripting Languages




Image source:http://www.mightystreamradio.com/PHOTOS/STREAM%20PHOTO%202.jpg
Apache Hive Makes You Feel Younger




Image source: http://majapszczolka.blox.pl/resource/Pszczolka_Maja_Baje_Pl_6.jpg
Speak ~SQL, But Run As MapReduce
HUE - Browser-Based Environment




Image source: http://www.sentric.ch/wp-content/uploads/2013/01/Create-table-in-Hive.png
Hive Is Based On & Limited By Hadoop
Apache Pig Makes Them Happier!


                        




Image source: http://vetnolimits.files.wordpress.com/2012/02/pumba.jpg
Pig Accelerates Development


        
Need To Add More Relational Data To HDFS




Based on the image from http://blog.cloudera.com/blog/2011/06/biodiversity-indexing-migration-from-mysql-to-hadoop/
SQL To Hadoop = Sqoop




Image source:http://3.bp.blogspot.com/_uuOo8x3WXWE/SuNV4y7qzeI/AAAAAAAAkYM/6RUExOMQPno/s400/pumpkin_eating_elephant.jpg
Sqoop Import/Export Data Using MR




Image source: http://blog.cloudera.com/blog/2011/10/apache-sqoop-overview/
Apache Oozie For Defining Workflows




Image source: Apache Oozie website
Apache Oozie For Scheduling




Image source:http://risingtechies.files.wordpress.com/2012/05/schedule.jpg
Need To Add Even More Logs To HDFS




Based on the image from http://blog.cloudera.com/blog/2011/06/biodiversity-indexing-migration-from-mysql-to-hadoop/
Apache Flume For Data Collection
                                     e.g. JDBC, Memory, File




Image source: Apache Flume website
How To Manager A Larger Cluster
Apache Avro + Snappy/Deflate_6




Image source: http://www.funkydiva.pl/wp-content/uploads/2012/10/lego-tapety-na-pulpit-duze-zdjecia-16.jpg
When Latency Is To High




Image source: http://www.pharmacyowners.com/Portals/37772/images/It-can-be-a-LONG-wait-at-the-pharmacy-resized-600.jpg
Cloudera Impala – Real-Time ~SQL Queries




Image source: http://static.cargurus.com/images/site/2010/07/02/12/24/1969_chevrolet_impala-pic-2868587530424686499.jpeg
Apache HBase - Random, Real-Time
Access To Big Data




Image source: http://www.superhqwallpapers.com/wp-content/uploads/2012/01/Super-Ferrari.jpg
YARN – Hadoop Cluster More Robust




Image source: http://globeattractions.com/wp-content/uploads/2012/01/green-leaf-drops-green-hd-leaf-nature-wet.jpg
Hadoop Is Successfully Deployed




Image source: http://bogdankipko.com/wp-content/uploads/2012/03/lessons-learned.jpg
Learn More About Apache Hadoop?
Use Hadoop To Solve Real-World Problems?
Oozie And YARN At WHUG, Today @18:00
Thank You! Any Questions About Them?




Image source: http://xn--gryprzegldarkowe-43b.com.pl/wp-content/uploads/2012/05/me-free-zoo1.jpg
Apache Hadoop Ecosystem (based on an exemplary data-driven…

Weitere ähnliche Inhalte

Was ist angesagt?

Python in big data world
Python in big data worldPython in big data world
Python in big data worldRohit
 
Introduction to Pig | Pig Architecture | Pig Fundamentals
Introduction to Pig | Pig Architecture | Pig FundamentalsIntroduction to Pig | Pig Architecture | Pig Fundamentals
Introduction to Pig | Pig Architecture | Pig FundamentalsSkillspeed
 
High-level Programming Languages: Apache Pig and Pig Latin
High-level Programming Languages: Apache Pig and Pig LatinHigh-level Programming Languages: Apache Pig and Pig Latin
High-level Programming Languages: Apache Pig and Pig LatinPietro Michiardi
 
Mapreduce in Search
Mapreduce in SearchMapreduce in Search
Mapreduce in SearchAmund Tveit
 
Hive vs Pig for HadoopSourceCodeReading
Hive vs Pig for HadoopSourceCodeReadingHive vs Pig for HadoopSourceCodeReading
Hive vs Pig for HadoopSourceCodeReadingMitsuharu Hamba
 
Pig Tutorial | Twitter Case Study | Apache Pig Script and Commands | Edureka
Pig Tutorial | Twitter Case Study | Apache Pig Script and Commands | EdurekaPig Tutorial | Twitter Case Study | Apache Pig Script and Commands | Edureka
Pig Tutorial | Twitter Case Study | Apache Pig Script and Commands | EdurekaEdureka!
 
[FOSS4G KOREA 2014]Hadoop 상에서 MapReduce를 이용한 Spatial Big Data 집계와 시스템 구축
[FOSS4G KOREA 2014]Hadoop 상에서 MapReduce를 이용한 Spatial Big Data 집계와 시스템 구축[FOSS4G KOREA 2014]Hadoop 상에서 MapReduce를 이용한 Spatial Big Data 집계와 시스템 구축
[FOSS4G KOREA 2014]Hadoop 상에서 MapReduce를 이용한 Spatial Big Data 집계와 시스템 구축Kwang Woo NAM
 
Streaming ML on Spark: Deprecated, experimental and internal ap is galore!
Streaming ML on Spark: Deprecated, experimental and internal ap is galore!Streaming ML on Spark: Deprecated, experimental and internal ap is galore!
Streaming ML on Spark: Deprecated, experimental and internal ap is galore!Holden Karau
 
Introduction to pig & pig latin
Introduction to pig & pig latinIntroduction to pig & pig latin
Introduction to pig & pig latinknowbigdata
 
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduce
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduceBIGDATA- Survey on Scheduling Methods in Hadoop MapReduce
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduceMahantesh Angadi
 
Hadoop MapReduce Streaming and Pipes
Hadoop MapReduce  Streaming and PipesHadoop MapReduce  Streaming and Pipes
Hadoop MapReduce Streaming and PipesHanborq Inc.
 
Big Data Laboratory
Big Data LaboratoryBig Data Laboratory
Big Data LaboratoryJ Singh
 
Getting started with Hadoop, Hive, and Elastic MapReduce
Getting started with Hadoop, Hive, and Elastic MapReduceGetting started with Hadoop, Hive, and Elastic MapReduce
Getting started with Hadoop, Hive, and Elastic MapReduceobdit
 
Extending Hadoop for Fun & Profit
Extending Hadoop for Fun & ProfitExtending Hadoop for Fun & Profit
Extending Hadoop for Fun & ProfitMilind Bhandarkar
 
Hadoop Summit San Jose 2014: Costing Your Big Data Operations
Hadoop Summit San Jose 2014: Costing Your Big Data Operations Hadoop Summit San Jose 2014: Costing Your Big Data Operations
Hadoop Summit San Jose 2014: Costing Your Big Data Operations Sumeet Singh
 

Was ist angesagt? (19)

Map Reduce
Map ReduceMap Reduce
Map Reduce
 
Python in big data world
Python in big data worldPython in big data world
Python in big data world
 
Introduction to Pig | Pig Architecture | Pig Fundamentals
Introduction to Pig | Pig Architecture | Pig FundamentalsIntroduction to Pig | Pig Architecture | Pig Fundamentals
Introduction to Pig | Pig Architecture | Pig Fundamentals
 
High-level Programming Languages: Apache Pig and Pig Latin
High-level Programming Languages: Apache Pig and Pig LatinHigh-level Programming Languages: Apache Pig and Pig Latin
High-level Programming Languages: Apache Pig and Pig Latin
 
Introduction to Pig
Introduction to PigIntroduction to Pig
Introduction to Pig
 
Mapreduce in Search
Mapreduce in SearchMapreduce in Search
Mapreduce in Search
 
Hive vs Pig for HadoopSourceCodeReading
Hive vs Pig for HadoopSourceCodeReadingHive vs Pig for HadoopSourceCodeReading
Hive vs Pig for HadoopSourceCodeReading
 
Pig Tutorial | Twitter Case Study | Apache Pig Script and Commands | Edureka
Pig Tutorial | Twitter Case Study | Apache Pig Script and Commands | EdurekaPig Tutorial | Twitter Case Study | Apache Pig Script and Commands | Edureka
Pig Tutorial | Twitter Case Study | Apache Pig Script and Commands | Edureka
 
[FOSS4G KOREA 2014]Hadoop 상에서 MapReduce를 이용한 Spatial Big Data 집계와 시스템 구축
[FOSS4G KOREA 2014]Hadoop 상에서 MapReduce를 이용한 Spatial Big Data 집계와 시스템 구축[FOSS4G KOREA 2014]Hadoop 상에서 MapReduce를 이용한 Spatial Big Data 집계와 시스템 구축
[FOSS4G KOREA 2014]Hadoop 상에서 MapReduce를 이용한 Spatial Big Data 집계와 시스템 구축
 
Streaming ML on Spark: Deprecated, experimental and internal ap is galore!
Streaming ML on Spark: Deprecated, experimental and internal ap is galore!Streaming ML on Spark: Deprecated, experimental and internal ap is galore!
Streaming ML on Spark: Deprecated, experimental and internal ap is galore!
 
Scaling hadoopapplications
Scaling hadoopapplicationsScaling hadoopapplications
Scaling hadoopapplications
 
Apache pig
Apache pigApache pig
Apache pig
 
Introduction to pig & pig latin
Introduction to pig & pig latinIntroduction to pig & pig latin
Introduction to pig & pig latin
 
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduce
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduceBIGDATA- Survey on Scheduling Methods in Hadoop MapReduce
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduce
 
Hadoop MapReduce Streaming and Pipes
Hadoop MapReduce  Streaming and PipesHadoop MapReduce  Streaming and Pipes
Hadoop MapReduce Streaming and Pipes
 
Big Data Laboratory
Big Data LaboratoryBig Data Laboratory
Big Data Laboratory
 
Getting started with Hadoop, Hive, and Elastic MapReduce
Getting started with Hadoop, Hive, and Elastic MapReduceGetting started with Hadoop, Hive, and Elastic MapReduce
Getting started with Hadoop, Hive, and Elastic MapReduce
 
Extending Hadoop for Fun & Profit
Extending Hadoop for Fun & ProfitExtending Hadoop for Fun & Profit
Extending Hadoop for Fun & Profit
 
Hadoop Summit San Jose 2014: Costing Your Big Data Operations
Hadoop Summit San Jose 2014: Costing Your Big Data Operations Hadoop Summit San Jose 2014: Costing Your Big Data Operations
Hadoop Summit San Jose 2014: Costing Your Big Data Operations
 

Andere mochten auch

Data model for analysis of scholarly documents in the MapReduce paradigm
Data model for analysis of scholarly documents in the MapReduce paradigm Data model for analysis of scholarly documents in the MapReduce paradigm
Data model for analysis of scholarly documents in the MapReduce paradigm Adam Kawa
 
Hadoop Playlist (Ignite talks at Strata + Hadoop World 2013)
Hadoop Playlist (Ignite talks at Strata + Hadoop World 2013)Hadoop Playlist (Ignite talks at Strata + Hadoop World 2013)
Hadoop Playlist (Ignite talks at Strata + Hadoop World 2013)Adam Kawa
 
Apache Hadoop In Theory And Practice
Apache Hadoop In Theory And PracticeApache Hadoop In Theory And Practice
Apache Hadoop In Theory And PracticeAdam Kawa
 
Hadoop Adventures At Spotify (Strata Conference + Hadoop World 2013)
Hadoop Adventures At Spotify (Strata Conference + Hadoop World 2013)Hadoop Adventures At Spotify (Strata Conference + Hadoop World 2013)
Hadoop Adventures At Spotify (Strata Conference + Hadoop World 2013)Adam Kawa
 
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...StampedeCon
 
Apache Hadoop YARN
Apache Hadoop YARNApache Hadoop YARN
Apache Hadoop YARNAdam Kawa
 
Systemy rekomendacji
Systemy rekomendacjiSystemy rekomendacji
Systemy rekomendacjiAdam Kawa
 
Apache Hadoop YARN, NameNode HA, HDFS Federation
Apache Hadoop YARN, NameNode HA, HDFS FederationApache Hadoop YARN, NameNode HA, HDFS Federation
Apache Hadoop YARN, NameNode HA, HDFS FederationAdam Kawa
 
Introduction To Apache Pig at WHUG
Introduction To Apache Pig at WHUGIntroduction To Apache Pig at WHUG
Introduction To Apache Pig at WHUGAdam Kawa
 
Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)
Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)
Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)Adam Kawa
 
Big Data At Spotify
Big Data At SpotifyBig Data At Spotify
Big Data At SpotifyAdam Kawa
 
Apache Hadoop Java API
Apache Hadoop Java APIApache Hadoop Java API
Apache Hadoop Java APIAdam Kawa
 

Andere mochten auch (13)

Data model for analysis of scholarly documents in the MapReduce paradigm
Data model for analysis of scholarly documents in the MapReduce paradigm Data model for analysis of scholarly documents in the MapReduce paradigm
Data model for analysis of scholarly documents in the MapReduce paradigm
 
Hadoop Playlist (Ignite talks at Strata + Hadoop World 2013)
Hadoop Playlist (Ignite talks at Strata + Hadoop World 2013)Hadoop Playlist (Ignite talks at Strata + Hadoop World 2013)
Hadoop Playlist (Ignite talks at Strata + Hadoop World 2013)
 
Apache Hadoop In Theory And Practice
Apache Hadoop In Theory And PracticeApache Hadoop In Theory And Practice
Apache Hadoop In Theory And Practice
 
Hadoop Adventures At Spotify (Strata Conference + Hadoop World 2013)
Hadoop Adventures At Spotify (Strata Conference + Hadoop World 2013)Hadoop Adventures At Spotify (Strata Conference + Hadoop World 2013)
Hadoop Adventures At Spotify (Strata Conference + Hadoop World 2013)
 
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
 
Apache Hadoop YARN
Apache Hadoop YARNApache Hadoop YARN
Apache Hadoop YARN
 
Systemy rekomendacji
Systemy rekomendacjiSystemy rekomendacji
Systemy rekomendacji
 
Toward Better Multi-Tenancy Support from HDFS
Toward Better Multi-Tenancy Support from HDFSToward Better Multi-Tenancy Support from HDFS
Toward Better Multi-Tenancy Support from HDFS
 
Apache Hadoop YARN, NameNode HA, HDFS Federation
Apache Hadoop YARN, NameNode HA, HDFS FederationApache Hadoop YARN, NameNode HA, HDFS Federation
Apache Hadoop YARN, NameNode HA, HDFS Federation
 
Introduction To Apache Pig at WHUG
Introduction To Apache Pig at WHUGIntroduction To Apache Pig at WHUG
Introduction To Apache Pig at WHUG
 
Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)
Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)
Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)
 
Big Data At Spotify
Big Data At SpotifyBig Data At Spotify
Big Data At Spotify
 
Apache Hadoop Java API
Apache Hadoop Java APIApache Hadoop Java API
Apache Hadoop Java API
 

Ähnlich wie Apache Hadoop Ecosystem (based on an exemplary data-driven…

Back to the [Completable] Future
Back to the [Completable] FutureBack to the [Completable] Future
Back to the [Completable] FutureSofiia Khomyn
 
Empowering DevOps with Cloud Foundry
Empowering DevOps with Cloud FoundryEmpowering DevOps with Cloud Foundry
Empowering DevOps with Cloud FoundryVMware Tanzu
 
Testing Like a Pro - Chef Infrastructure Testing
Testing Like a Pro - Chef Infrastructure TestingTesting Like a Pro - Chef Infrastructure Testing
Testing Like a Pro - Chef Infrastructure TestingTim Smith
 
Fitting the pieces together - at Drupal Summit Europe - 2011
Fitting the pieces together - at Drupal Summit Europe - 2011Fitting the pieces together - at Drupal Summit Europe - 2011
Fitting the pieces together - at Drupal Summit Europe - 2011Katrien De Graeve
 
HTML5 after the hype - JFokus2015
HTML5 after the hype - JFokus2015HTML5 after the hype - JFokus2015
HTML5 after the hype - JFokus2015Christian Heilmann
 
Design+Performance Velocity 2015
Design+Performance Velocity 2015Design+Performance Velocity 2015
Design+Performance Velocity 2015Steve Souders
 
Logan composition (2)
Logan composition (2)Logan composition (2)
Logan composition (2)loganm
 
Hacking Web Performance @ ForwardJS 2017
Hacking Web Performance @ ForwardJS 2017Hacking Web Performance @ ForwardJS 2017
Hacking Web Performance @ ForwardJS 2017Maximiliano Firtman
 
10 Laravel packages everyone should know
10 Laravel packages everyone should know10 Laravel packages everyone should know
10 Laravel packages everyone should knowPovilas Korop
 
[psuweb] Adaptive Images in Responsive Web Design
[psuweb] Adaptive Images in Responsive Web Design[psuweb] Adaptive Images in Responsive Web Design
[psuweb] Adaptive Images in Responsive Web DesignChristopher Schmitt
 
Prediction io 架構與整合 -DataCon.TW-2017
Prediction io 架構與整合 -DataCon.TW-2017Prediction io 架構與整合 -DataCon.TW-2017
Prediction io 架構與整合 -DataCon.TW-2017William Lee
 
Tactics to Kickstart Your Journey Toward Continuous Delivery
Tactics to Kickstart Your Journey Toward Continuous DeliveryTactics to Kickstart Your Journey Toward Continuous Delivery
Tactics to Kickstart Your Journey Toward Continuous DeliveryJeff Gallimore
 
Tactics to Kickstart Your Journey Toward Continuous Delivery
Tactics to Kickstart Your Journey Toward Continuous DeliveryTactics to Kickstart Your Journey Toward Continuous Delivery
Tactics to Kickstart Your Journey Toward Continuous DeliveryExcella
 
Vpn presentation richard kong
Vpn presentation   richard kongVpn presentation   richard kong
Vpn presentation richard kongRichardKong18
 
High Performance HTML5 (SF HTML5 UG)
High Performance HTML5 (SF HTML5 UG)High Performance HTML5 (SF HTML5 UG)
High Performance HTML5 (SF HTML5 UG)Steve Souders
 
Tactics to Kickstart Your Journey Toward DevOps
Tactics to Kickstart Your Journey Toward DevOpsTactics to Kickstart Your Journey Toward DevOps
Tactics to Kickstart Your Journey Toward DevOpsJeff Gallimore
 
Tactics to Kickstart Your Journey Toward DevOps
Tactics to Kickstart Your Journey Toward DevOpsTactics to Kickstart Your Journey Toward DevOps
Tactics to Kickstart Your Journey Toward DevOpsExcella
 

Ähnlich wie Apache Hadoop Ecosystem (based on an exemplary data-driven… (20)

Back to the [Completable] Future
Back to the [Completable] FutureBack to the [Completable] Future
Back to the [Completable] Future
 
Empowering DevOps with Cloud Foundry
Empowering DevOps with Cloud FoundryEmpowering DevOps with Cloud Foundry
Empowering DevOps with Cloud Foundry
 
Testing Like a Pro - Chef Infrastructure Testing
Testing Like a Pro - Chef Infrastructure TestingTesting Like a Pro - Chef Infrastructure Testing
Testing Like a Pro - Chef Infrastructure Testing
 
The Last Mile
The Last MileThe Last Mile
The Last Mile
 
Fitting the pieces together - at Drupal Summit Europe - 2011
Fitting the pieces together - at Drupal Summit Europe - 2011Fitting the pieces together - at Drupal Summit Europe - 2011
Fitting the pieces together - at Drupal Summit Europe - 2011
 
HTML5 after the hype - JFokus2015
HTML5 after the hype - JFokus2015HTML5 after the hype - JFokus2015
HTML5 after the hype - JFokus2015
 
Design+Performance Velocity 2015
Design+Performance Velocity 2015Design+Performance Velocity 2015
Design+Performance Velocity 2015
 
Logan composition (2)
Logan composition (2)Logan composition (2)
Logan composition (2)
 
Hacking Web Performance @ ForwardJS 2017
Hacking Web Performance @ ForwardJS 2017Hacking Web Performance @ ForwardJS 2017
Hacking Web Performance @ ForwardJS 2017
 
10 Laravel packages everyone should know
10 Laravel packages everyone should know10 Laravel packages everyone should know
10 Laravel packages everyone should know
 
Velocity Report 2009
Velocity Report 2009Velocity Report 2009
Velocity Report 2009
 
[psuweb] Adaptive Images in Responsive Web Design
[psuweb] Adaptive Images in Responsive Web Design[psuweb] Adaptive Images in Responsive Web Design
[psuweb] Adaptive Images in Responsive Web Design
 
Prediction io 架構與整合 -DataCon.TW-2017
Prediction io 架構與整合 -DataCon.TW-2017Prediction io 架構與整合 -DataCon.TW-2017
Prediction io 架構與整合 -DataCon.TW-2017
 
Tactics to Kickstart Your Journey Toward Continuous Delivery
Tactics to Kickstart Your Journey Toward Continuous DeliveryTactics to Kickstart Your Journey Toward Continuous Delivery
Tactics to Kickstart Your Journey Toward Continuous Delivery
 
Tactics to Kickstart Your Journey Toward Continuous Delivery
Tactics to Kickstart Your Journey Toward Continuous DeliveryTactics to Kickstart Your Journey Toward Continuous Delivery
Tactics to Kickstart Your Journey Toward Continuous Delivery
 
Vpn presentation richard kong
Vpn presentation   richard kongVpn presentation   richard kong
Vpn presentation richard kong
 
GDG Varna - Hadoop
GDG Varna - HadoopGDG Varna - Hadoop
GDG Varna - Hadoop
 
High Performance HTML5 (SF HTML5 UG)
High Performance HTML5 (SF HTML5 UG)High Performance HTML5 (SF HTML5 UG)
High Performance HTML5 (SF HTML5 UG)
 
Tactics to Kickstart Your Journey Toward DevOps
Tactics to Kickstart Your Journey Toward DevOpsTactics to Kickstart Your Journey Toward DevOps
Tactics to Kickstart Your Journey Toward DevOps
 
Tactics to Kickstart Your Journey Toward DevOps
Tactics to Kickstart Your Journey Toward DevOpsTactics to Kickstart Your Journey Toward DevOps
Tactics to Kickstart Your Journey Toward DevOps
 

Kürzlich hochgeladen

Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 

Kürzlich hochgeladen (20)

Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 

Apache Hadoop Ecosystem (based on an exemplary data-driven…

  • 1. Vademecum Big Data Adam Kawa, Spotify, Compendium CE
  • 2. About Me Spotify/Compendium, WHUG/SHUG, HakunaMapData.com, +2.5Y
  • 3. And The 20-Minute Story About ... Image source:http://www.containsmoderateperil.com/wp-content/uploads/2012/09/Dev-Diary-Epic-Story.jpg
  • 4. A Really Data-Driven Company … Image source: http://wwwimg.roku.com/hero-images/home2_1.jpg
  • 5. And Some Inevitable Problems ... Image source: http://www.digitalnewsasia.com/sites/default/files/images/digital%20economy/data%20explosion.jpg
  • 6. And Some Inevitable Problems ... Image source: http://p.alejka.pl/i2/p_new/36/42/grosz-na-szczescie-ze-zlota-m-1z-doskonaly-na-kazda-okazje_0_b.jpg
  • 7. And Some Inevitable Problems ... Image source: http://25.media.tumblr.com/d1038e7831eae86f5e84d0d09a2e6fad/tumblr_mfh5srmNAR1s06a3to1_500.jpg
  • 9. The First Approach Works Fine ...
  • 10. Until Data Gets Bigger ...
  • 12. The Data Monster Becomes A Problem Image source: http://cloudtimes.org/wp-content/uploads/2012/05/big-data.jpg
  • 13. Apache Hadoop Becomes A Solution Image source: http://gigaom2.files.wordpress.com/2012/06/shutterstock_60414424.jpg
  • 14. Orchestra Of Nodes Image source: http://www.dsn.jhu.edu/images/orchestra.gif
  • 16. Untypical Orchestra Of Typical* Nodes * however having very cheap nodes is false economy
  • 18. Hadoop Distributed File System (HDFS) Image source: http://www.wallcoo.net/car/Trucks/images/Big_Truck_on_Road_.jpg
  • 19. HDFS Blocks And Replication
  • 20. HDFS Self-Healing Features Image source: http://www.mwctoys.com/images/review_hydra_3.jpg
  • 21. HDFS Scales And Shines With MapReduce Image source: http://www.kkkp.pl/graph/gr_kdz_char3.jpg
  • 22. MapReduce Is A Change DATA Map And Reduce Image source: http://2.bp.blogspot.com/-Kl1ADjd3_7I/T6a8ZQV7ITI/AAAAAAAAKfE/qVyTQdJl2Do/s1600/make-big-changes-in-small-steps.png
  • 23. Map And Reduce Functions
  • 26. Sending Computation To Data Data Is Here! Computation Image source: http://www.conservationmagazine.org/wp-content/uploads/2011/03/ElephantAndMouse1.jpg
  • 27. MapReduce Implementation Image source: http://i3.mirror.co.uk/incoming/article1360046.ece/ALTERNATES/s615/Male+drones+tend+to+honeycomb+cells+in+a+bee+colony
  • 28. First Success: 5-Node Hadoop Cluster Image source: http://www.smallbiztechnology.com/wp-content/uploads/2012/12/success.jpg
  • 29. Apache Whirr And The Cloud ===== hadoop.properties ============= whirr.cluster-name=production_cluster whirr.instance-templates= 1 hadoop-jobtracker+hadoop-namenode, 4 hadoop-datanode+hadoop-tasktracker whirr.provider=aws-ec2 # or Rackspace cloudservers-us ... ===================================== $ whirr launch-cluster --config hadoop.properties $ whirr destroy-cluster --config hadoop.properties
  • 30. First Sad (Non-Java Speaking) Developers Image source: http://www.shivayanaturals.com/wp-content/uploads/2012/01/Unhappy.jpg
  • 31. Hadoop Streaming For Scripting Languages Image source:http://www.mightystreamradio.com/PHOTOS/STREAM%20PHOTO%202.jpg
  • 32. Apache Hive Makes You Feel Younger Image source: http://majapszczolka.blox.pl/resource/Pszczolka_Maja_Baje_Pl_6.jpg
  • 33. Speak ~SQL, But Run As MapReduce
  • 34. HUE - Browser-Based Environment Image source: http://www.sentric.ch/wp-content/uploads/2013/01/Create-table-in-Hive.png
  • 35. Hive Is Based On & Limited By Hadoop
  • 36. Apache Pig Makes Them Happier!   Image source: http://vetnolimits.files.wordpress.com/2012/02/pumba.jpg
  • 38. Need To Add More Relational Data To HDFS Based on the image from http://blog.cloudera.com/blog/2011/06/biodiversity-indexing-migration-from-mysql-to-hadoop/
  • 39. SQL To Hadoop = Sqoop Image source:http://3.bp.blogspot.com/_uuOo8x3WXWE/SuNV4y7qzeI/AAAAAAAAkYM/6RUExOMQPno/s400/pumpkin_eating_elephant.jpg
  • 40. Sqoop Import/Export Data Using MR Image source: http://blog.cloudera.com/blog/2011/10/apache-sqoop-overview/
  • 41. Apache Oozie For Defining Workflows Image source: Apache Oozie website
  • 42. Apache Oozie For Scheduling Image source:http://risingtechies.files.wordpress.com/2012/05/schedule.jpg
  • 43. Need To Add Even More Logs To HDFS Based on the image from http://blog.cloudera.com/blog/2011/06/biodiversity-indexing-migration-from-mysql-to-hadoop/
  • 44. Apache Flume For Data Collection e.g. JDBC, Memory, File Image source: Apache Flume website
  • 45. How To Manager A Larger Cluster
  • 46. Apache Avro + Snappy/Deflate_6 Image source: http://www.funkydiva.pl/wp-content/uploads/2012/10/lego-tapety-na-pulpit-duze-zdjecia-16.jpg
  • 47. When Latency Is To High Image source: http://www.pharmacyowners.com/Portals/37772/images/It-can-be-a-LONG-wait-at-the-pharmacy-resized-600.jpg
  • 48. Cloudera Impala – Real-Time ~SQL Queries Image source: http://static.cargurus.com/images/site/2010/07/02/12/24/1969_chevrolet_impala-pic-2868587530424686499.jpeg
  • 49. Apache HBase - Random, Real-Time Access To Big Data Image source: http://www.superhqwallpapers.com/wp-content/uploads/2012/01/Super-Ferrari.jpg
  • 50. YARN – Hadoop Cluster More Robust Image source: http://globeattractions.com/wp-content/uploads/2012/01/green-leaf-drops-green-hd-leaf-nature-wet.jpg
  • 51. Hadoop Is Successfully Deployed Image source: http://bogdankipko.com/wp-content/uploads/2012/03/lessons-learned.jpg
  • 52. Learn More About Apache Hadoop?
  • 53. Use Hadoop To Solve Real-World Problems?
  • 54. Oozie And YARN At WHUG, Today @18:00
  • 55. Thank You! Any Questions About Them? Image source: http://xn--gryprzegldarkowe-43b.com.pl/wp-content/uploads/2012/05/me-free-zoo1.jpg