SlideShare a Scribd company logo
1 of 34
Hadoop and IoT 
Darko Marjanović 
Đorđe Stepanić 
Miloš Milovanović
AGENDA 
BIG DATA 
HADOOP AND IOT MODEL 
HADOOP 
IOT 
HADOOP DATA PROCESSING 
HIVE 
STINGER INITIATIVE 
Q&A
BIG DATA 
Big Data describes the collection of complex and large data sets such that it’s 
difficult to capture, process, store, search and analyze using conventional data 
base systems. 
Anything that Won't Fit in Excel. 
*Definition taken from (www.bigdata-startups.com)
BIG DATA DIMESIONS 
1992 100GB/Day 
2002 100GB/Second 
2013 28,000GB/Second 
2018 50,000GB/Second
HADOOP AND IOT
HADOOP 
Apache Hadoop is an open-source software framework for storage and large-scale 
processing of data-sets on clusters of commodity hardware. 
Hadoop was created by Doug Cutting and Mike Cafarella in 2005 
All the modules in Hadoop are designed with a fundamental assumption that 
hardware failures are common and thus should be automatically handled in software 
by the framework.
HADOOP COMPONENTS 
Hadoop common 
HDFS 
Map Reduce 
YARN (Starting with Hadoop 2.x.x)
HADOOP HDFS 
The Hadoop distributed file system (HDFS) is a distributed, scalable, and portable file-system 
written in Java for the Hadoop framework.
HADOOP MAP REDUCE 
Map Reduce is a programming model and an associated implementation for processing 
and generating large data sets with a parallel, distributed algorithm on a cluster.
HADOOP YARN 
Apache Hadoop YARN (Yet Another Resource Negotiator) is a cluster management 
technology. YARN is now characterized as a large-scale, distributed operating 
system for big data applications.
HADOOP ECOSYSTEM 
The main groups of tools in the Hadoop ecosystem: 
Data Ingestion (Flume, Sqoop …) 
Data Processing (Pig, Hive, Storm …) 
Cluster Management(Ambari) 
Security (Knox)
DATA INGESTION 
Flume 
Flume is a distributed, reliable, and available service for efficiently collecting, 
aggregating, and moving large amounts of streaming event data. 
Sqoop 
Apache Sqoop is a tool designed for efficiently transferring bulk data between Apache 
Hadoop and structured datastores such as relational databases. 
WEB HDFS REST API
FLUME EXAMPLE
SQOOP AND WEB HDFS API EXAMPLE
IOT
UBIQUITOUS COMPUTING & INTERNET OF THINGS 
Ubiquitous computing - trend (wave) in computing where computers are 
spreaded throughout our everyday environment. 
Concept: one person - many computers 
Internet Of Things - is the network of physical objects accessed through the 
Internet, which contains embedded technology to interact (sense and 
communicate) with internal states or the external environment 
(Cisco definition).
INTERNET OF THINGS COMPONENTS
INTERNET OF THINGS AND BIG DATA
REAL-TIME DATA, STRUCTURED AND UNSTRUCTURED DATA GENERATED FROM INTERNET OF THINGS
INTERNET OF THINGS - FIELDS OF APPLICATION 
* Production - energy savings, lower maintenance costs, prediction of 
machine failure, quality control etc. 
** Logistic - efficient supply control , optimization of transport, 
environmental controls in the warehouse, JIT, lean logistics, better capacity 
utilization etc. 
Smart cities & environment - smart parking, traffic congestion, smart 
lighting, waste management, noise urban maps, air pollution etc. 
Smart agriculture 
eHealth 
and everything you can imagine...
HADOOP DATA PROCESSING 
Input: 
- Raw data files 
- No metadata 
- No schema 
Objective: 
- Perform analysis, run interactive queries 
- Explore, structure and analyze the data 
- Real-time processing (Apache Storm) 
- Visualization
HIVE 
Apache Hive is a data warehousing software that facilitates querying and 
managing large datasets residing in distributed storage. 
Hive provides: 
- Tools ETL processes 
- A mechanism for imposing a structure on a variety of data formats 
- Access to files stored in HDFS or other storage systems 
- Query execution via MapReduce?
HIVE ARCHITECTURE 
Data Model: 
- Tables 
- Partitions 
- Buckets 
SERDEs 
Datatypes: 
Common primitive data types (int, 
boolean, float, double, string, char, date, 
timestamp, …) 
+Complex data types (structs, maps, 
arrays) 
UI 
Driver 
Compiler 
Metastore 
Execution 
engine
HIVE.NOW 
Hive defines a simple SQL-like query language, called HQL, that enables users 
familiar with SQL to query the data. 
Scalable and extensible. 
Most commonly used for: 
- Log analysis 
- Statistical analysis 
- Document indexing
HIVE SCRIPT EXAMPLE
STINGER INITIATIVE 
Stinger is the initiative to improve query execution time and increase SQL 
functionality for Apache Hive. Microsoft and Hortonworks worked actively in the 
Apache community towards completing Stinger. 
Announced in February 2013 
44 companies, 145 developers, 392,000 lines of Java code 
Hive 0.13 
Speed: Hive on Tez, vectorized query engine & cost-based optimizer 
Scale: dynamic partition loads and smaller hash tables 
SQL: CHAR & DECIMAL datatypes, subqueries for IN / NOT IN 
Improved Hive performance up to 100x.
STINGER.NEXT 
Stinger.next is a continuation of Stinger initiative to further speed, scale and SQL in 
Hive in the open Apache Hive community. 
Main goals: 
- transactions with ACID semantics 
- sub-second queries 
- SQL:2011 Analytics 
- usability improvements 
To be delivered in next 18 months.
HIVE ON SPARK 
Apache Spark is a fast and general engine for large-scale data processing. 
Spark powers a stack of high-level tools including Spark SQL, MLlib for machine 
learning, GraphX, and Spark Streaming. 
Hive-Spark Machine Learning Integration will allow Hive users to run machine 
learning models via Hive.
STINGER.NEXT 
*Photo taken from the official Hortonworks website (www.hortonworks.com)
Q&A 
darko@thingsolver.com 
djordje@thingsolver.com 
milosmilovanovic@outlook.com 
hadoop-srbija.com
Please rate this lecture 
and win Windows Phone NOKIA Lumia 1320 
Help us choose the best Sinergija lecturer! 
Microsoft will award you – at the conference end, 
we’ll give one NOKIA Lumia 1320 to someone 
from the audience – randomly. 
Go to www.mssinergija.net, log in and cast your 
votes! 
You can rate only lectures that you were present 
at, just once. More lectures you rate, more 
chances you have. 
Winner will be announced at the official Sinergija 
web portal, www.mssinergija.net
Hadoop and IoT Sinergija 2014

More Related Content

What's hot

Building a Big Data platform with the Hadoop ecosystem
Building a Big Data platform with the Hadoop ecosystemBuilding a Big Data platform with the Hadoop ecosystem
Building a Big Data platform with the Hadoop ecosystemGregg Barrett
 
Big data vahidamiri-tabriz-13960226-datastack.ir
Big data vahidamiri-tabriz-13960226-datastack.irBig data vahidamiri-tabriz-13960226-datastack.ir
Big data vahidamiri-tabriz-13960226-datastack.irdatastack
 
Hd insight overview
Hd insight overviewHd insight overview
Hd insight overviewvhrocca
 
Hadoop and Big Data: Revealed
Hadoop and Big Data: RevealedHadoop and Big Data: Revealed
Hadoop and Big Data: RevealedSachin Holla
 
Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...
Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...
Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...Agile Testing Alliance
 
Introduction To Big Data Analytics On Hadoop - SpringPeople
Introduction To Big Data Analytics On Hadoop - SpringPeopleIntroduction To Big Data Analytics On Hadoop - SpringPeople
Introduction To Big Data Analytics On Hadoop - SpringPeopleSpringPeople
 
Big data vahidamiri-datastack.ir
Big data vahidamiri-datastack.irBig data vahidamiri-datastack.ir
Big data vahidamiri-datastack.irdatastack
 
Hadoop - Architectural road map for Hadoop Ecosystem
Hadoop -  Architectural road map for Hadoop EcosystemHadoop -  Architectural road map for Hadoop Ecosystem
Hadoop - Architectural road map for Hadoop Ecosystemnallagangus
 
ING- CoreIntel- Collect and Process Network Logs Across Data Centers in Real ...
ING- CoreIntel- Collect and Process Network Logs Across Data Centers in Real ...ING- CoreIntel- Collect and Process Network Logs Across Data Centers in Real ...
ING- CoreIntel- Collect and Process Network Logs Across Data Centers in Real ...DataWorks Summit/Hadoop Summit
 
Data lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amiryData lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amirydatastack
 
Rob peglar introduction_analytics _big data_hadoop
Rob peglar introduction_analytics _big data_hadoopRob peglar introduction_analytics _big data_hadoop
Rob peglar introduction_analytics _big data_hadoopGhassan Al-Yafie
 
Hortonworks Data Platform and IBM Systems - A Complete Solution for Cognitive...
Hortonworks Data Platform and IBM Systems - A Complete Solution for Cognitive...Hortonworks Data Platform and IBM Systems - A Complete Solution for Cognitive...
Hortonworks Data Platform and IBM Systems - A Complete Solution for Cognitive...DataWorks Summit/Hadoop Summit
 
Open source stak of big data techs open suse asia
Open source stak of big data techs   open suse asiaOpen source stak of big data techs   open suse asia
Open source stak of big data techs open suse asiaMuhammad Rifqi
 
The key to unlocking the Value in the IoT? Managing the Data!
The key to unlocking the Value in the IoT? Managing the Data!The key to unlocking the Value in the IoT? Managing the Data!
The key to unlocking the Value in the IoT? Managing the Data!DataWorks Summit/Hadoop Summit
 
Introduction to Apache Hadoop Eco-System
Introduction to Apache Hadoop Eco-SystemIntroduction to Apache Hadoop Eco-System
Introduction to Apache Hadoop Eco-SystemMd. Hasan Basri (Angel)
 
Hadoop Architecture Options for Existing Enterprise DataWarehouse
Hadoop Architecture Options for Existing Enterprise DataWarehouseHadoop Architecture Options for Existing Enterprise DataWarehouse
Hadoop Architecture Options for Existing Enterprise DataWarehouseAsis Mohanty
 
Scala: the unpredicted lingua franca for data science
Scala: the unpredicted lingua franca  for data scienceScala: the unpredicted lingua franca  for data science
Scala: the unpredicted lingua franca for data scienceAndy Petrella
 
Bigdata and Hadoop Bootcamp
Bigdata and Hadoop BootcampBigdata and Hadoop Bootcamp
Bigdata and Hadoop BootcampSpotle.ai
 
Big Data in the Real World
Big Data in the Real WorldBig Data in the Real World
Big Data in the Real WorldMark Kromer
 

What's hot (20)

Building a Big Data platform with the Hadoop ecosystem
Building a Big Data platform with the Hadoop ecosystemBuilding a Big Data platform with the Hadoop ecosystem
Building a Big Data platform with the Hadoop ecosystem
 
Big data vahidamiri-tabriz-13960226-datastack.ir
Big data vahidamiri-tabriz-13960226-datastack.irBig data vahidamiri-tabriz-13960226-datastack.ir
Big data vahidamiri-tabriz-13960226-datastack.ir
 
Hadoop Tutorial For Beginners
Hadoop Tutorial For BeginnersHadoop Tutorial For Beginners
Hadoop Tutorial For Beginners
 
Hd insight overview
Hd insight overviewHd insight overview
Hd insight overview
 
Hadoop and Big Data: Revealed
Hadoop and Big Data: RevealedHadoop and Big Data: Revealed
Hadoop and Big Data: Revealed
 
Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...
Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...
Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...
 
Introduction To Big Data Analytics On Hadoop - SpringPeople
Introduction To Big Data Analytics On Hadoop - SpringPeopleIntroduction To Big Data Analytics On Hadoop - SpringPeople
Introduction To Big Data Analytics On Hadoop - SpringPeople
 
Big data vahidamiri-datastack.ir
Big data vahidamiri-datastack.irBig data vahidamiri-datastack.ir
Big data vahidamiri-datastack.ir
 
Hadoop - Architectural road map for Hadoop Ecosystem
Hadoop -  Architectural road map for Hadoop EcosystemHadoop -  Architectural road map for Hadoop Ecosystem
Hadoop - Architectural road map for Hadoop Ecosystem
 
ING- CoreIntel- Collect and Process Network Logs Across Data Centers in Real ...
ING- CoreIntel- Collect and Process Network Logs Across Data Centers in Real ...ING- CoreIntel- Collect and Process Network Logs Across Data Centers in Real ...
ING- CoreIntel- Collect and Process Network Logs Across Data Centers in Real ...
 
Data lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amiryData lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amiry
 
Rob peglar introduction_analytics _big data_hadoop
Rob peglar introduction_analytics _big data_hadoopRob peglar introduction_analytics _big data_hadoop
Rob peglar introduction_analytics _big data_hadoop
 
Hortonworks Data Platform and IBM Systems - A Complete Solution for Cognitive...
Hortonworks Data Platform and IBM Systems - A Complete Solution for Cognitive...Hortonworks Data Platform and IBM Systems - A Complete Solution for Cognitive...
Hortonworks Data Platform and IBM Systems - A Complete Solution for Cognitive...
 
Open source stak of big data techs open suse asia
Open source stak of big data techs   open suse asiaOpen source stak of big data techs   open suse asia
Open source stak of big data techs open suse asia
 
The key to unlocking the Value in the IoT? Managing the Data!
The key to unlocking the Value in the IoT? Managing the Data!The key to unlocking the Value in the IoT? Managing the Data!
The key to unlocking the Value in the IoT? Managing the Data!
 
Introduction to Apache Hadoop Eco-System
Introduction to Apache Hadoop Eco-SystemIntroduction to Apache Hadoop Eco-System
Introduction to Apache Hadoop Eco-System
 
Hadoop Architecture Options for Existing Enterprise DataWarehouse
Hadoop Architecture Options for Existing Enterprise DataWarehouseHadoop Architecture Options for Existing Enterprise DataWarehouse
Hadoop Architecture Options for Existing Enterprise DataWarehouse
 
Scala: the unpredicted lingua franca for data science
Scala: the unpredicted lingua franca  for data scienceScala: the unpredicted lingua franca  for data science
Scala: the unpredicted lingua franca for data science
 
Bigdata and Hadoop Bootcamp
Bigdata and Hadoop BootcampBigdata and Hadoop Bootcamp
Bigdata and Hadoop Bootcamp
 
Big Data in the Real World
Big Data in the Real WorldBig Data in the Real World
Big Data in the Real World
 

Viewers also liked

少女のメガホン-マニュアル①キャンペーン作成
少女のメガホン-マニュアル①キャンペーン作成少女のメガホン-マニュアル①キャンペーン作成
少女のメガホン-マニュアル①キャンペーン作成chucklellc
 
Portfolio and Unique Skill Sets
Portfolio and Unique Skill SetsPortfolio and Unique Skill Sets
Portfolio and Unique Skill SetsNicole Burkholder
 
Siyer-i Nebi 26. Sayısı
Siyer-i Nebi 26. SayısıSiyer-i Nebi 26. Sayısı
Siyer-i Nebi 26. Sayısısiyerinebi
 
Readying Robots For War - CBS News
Readying Robots For War - CBS NewsReadying Robots For War - CBS News
Readying Robots For War - CBS Newssharirodrigues13
 
Keynote - Sales-Velocity-2014-The Blowfish Effect
Keynote - Sales-Velocity-2014-The Blowfish EffectKeynote - Sales-Velocity-2014-The Blowfish Effect
Keynote - Sales-Velocity-2014-The Blowfish EffectSelligy
 
PG Day'14 Russia, Социальная сеть, которая просто работает, Владислав Коваль
PG Day'14 Russia, Социальная сеть, которая просто работает, Владислав КовальPG Day'14 Russia, Социальная сеть, которая просто работает, Владислав Коваль
PG Day'14 Russia, Социальная сеть, которая просто работает, Владислав Ковальpgdayrussia
 
m04motherboard 140305083930-phpapp01
m04motherboard 140305083930-phpapp01m04motherboard 140305083930-phpapp01
m04motherboard 140305083930-phpapp01Monikasilvia Gultom
 
Sustainable economy series: part 1 – slavery in the shrimp industry
Sustainable economy series: part 1 – slavery in the shrimp industrySustainable economy series: part 1 – slavery in the shrimp industry
Sustainable economy series: part 1 – slavery in the shrimp industryAbrham Millar
 
Betty Crocker - What's her problem?
Betty Crocker - What's her problem?Betty Crocker - What's her problem?
Betty Crocker - What's her problem?Viola Crellin
 

Viewers also liked (13)

少女のメガホン-マニュアル①キャンペーン作成
少女のメガホン-マニュアル①キャンペーン作成少女のメガホン-マニュアル①キャンペーン作成
少女のメガホン-マニュアル①キャンペーン作成
 
Pairs of anglles & transversal
Pairs of anglles & transversalPairs of anglles & transversal
Pairs of anglles & transversal
 
Boom boom 9
Boom boom 9Boom boom 9
Boom boom 9
 
Portfolio and Unique Skill Sets
Portfolio and Unique Skill SetsPortfolio and Unique Skill Sets
Portfolio and Unique Skill Sets
 
Siyer-i Nebi 26. Sayısı
Siyer-i Nebi 26. SayısıSiyer-i Nebi 26. Sayısı
Siyer-i Nebi 26. Sayısı
 
Readying Robots For War - CBS News
Readying Robots For War - CBS NewsReadying Robots For War - CBS News
Readying Robots For War - CBS News
 
Keynote - Sales-Velocity-2014-The Blowfish Effect
Keynote - Sales-Velocity-2014-The Blowfish EffectKeynote - Sales-Velocity-2014-The Blowfish Effect
Keynote - Sales-Velocity-2014-The Blowfish Effect
 
PG Day'14 Russia, Социальная сеть, которая просто работает, Владислав Коваль
PG Day'14 Russia, Социальная сеть, которая просто работает, Владислав КовальPG Day'14 Russia, Социальная сеть, которая просто работает, Владислав Коваль
PG Day'14 Russia, Социальная сеть, которая просто работает, Владислав Коваль
 
m04motherboard 140305083930-phpapp01
m04motherboard 140305083930-phpapp01m04motherboard 140305083930-phpapp01
m04motherboard 140305083930-phpapp01
 
Sustainable economy series: part 1 – slavery in the shrimp industry
Sustainable economy series: part 1 – slavery in the shrimp industrySustainable economy series: part 1 – slavery in the shrimp industry
Sustainable economy series: part 1 – slavery in the shrimp industry
 
Grün de Vera,Rosa Mabel- Diapositiva 1
Grün de Vera,Rosa Mabel- Diapositiva 1Grün de Vera,Rosa Mabel- Diapositiva 1
Grün de Vera,Rosa Mabel- Diapositiva 1
 
Video beyond YouTube
Video beyond YouTubeVideo beyond YouTube
Video beyond YouTube
 
Betty Crocker - What's her problem?
Betty Crocker - What's her problem?Betty Crocker - What's her problem?
Betty Crocker - What's her problem?
 

Similar to Hadoop and IoT Sinergija 2014

Using Machine Learning with HDInsight
Using Machine Learning with HDInsightUsing Machine Learning with HDInsight
Using Machine Learning with HDInsightEng Teong Cheah
 
Architecting the Future of Big Data and Search
Architecting the Future of Big Data and SearchArchitecting the Future of Big Data and Search
Architecting the Future of Big Data and SearchHortonworks
 
Cloud Austin Meetup - Hadoop like a champion
Cloud Austin Meetup - Hadoop like a championCloud Austin Meetup - Hadoop like a champion
Cloud Austin Meetup - Hadoop like a championAmeet Paranjape
 
Tools and techniques for data science
Tools and techniques for data scienceTools and techniques for data science
Tools and techniques for data scienceAjay Ohri
 
Présentation on radoop
Présentation on radoop   Présentation on radoop
Présentation on radoop siliconsudipt
 
Hadoop Big Data A big picture
Hadoop Big Data A big pictureHadoop Big Data A big picture
Hadoop Big Data A big pictureJ S Jodha
 
Analysis of historical movie data by BHADRA
Analysis of historical movie data by BHADRAAnalysis of historical movie data by BHADRA
Analysis of historical movie data by BHADRABhadra Gowdra
 
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder HortonworksThe Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder HortonworksData Con LA
 
EclipseCon Keynote: Apache Hadoop - An Introduction
EclipseCon Keynote: Apache Hadoop - An IntroductionEclipseCon Keynote: Apache Hadoop - An Introduction
EclipseCon Keynote: Apache Hadoop - An IntroductionCloudera, Inc.
 
Azure Cafe Marketplace with Hortonworks March 31 2016
Azure Cafe Marketplace with Hortonworks March 31 2016Azure Cafe Marketplace with Hortonworks March 31 2016
Azure Cafe Marketplace with Hortonworks March 31 2016Joan Novino
 
Overview of big data & hadoop version 1 - Tony Nguyen
Overview of big data & hadoop   version 1 - Tony NguyenOverview of big data & hadoop   version 1 - Tony Nguyen
Overview of big data & hadoop version 1 - Tony NguyenThanh Nguyen
 
Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1Thanh Nguyen
 
Hadoop Platforms - Introduction, Importance, Providers
Hadoop Platforms - Introduction, Importance, ProvidersHadoop Platforms - Introduction, Importance, Providers
Hadoop Platforms - Introduction, Importance, ProvidersMrigendra Sharma
 
THE SOLUTION FOR BIG DATA
THE SOLUTION FOR BIG DATATHE SOLUTION FOR BIG DATA
THE SOLUTION FOR BIG DATATarak Tar
 
THE SOLUTION FOR BIG DATA
THE SOLUTION FOR BIG DATATHE SOLUTION FOR BIG DATA
THE SOLUTION FOR BIG DATATarak Tar
 

Similar to Hadoop and IoT Sinergija 2014 (20)

Big data
Big dataBig data
Big data
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
Using Machine Learning with HDInsight
Using Machine Learning with HDInsightUsing Machine Learning with HDInsight
Using Machine Learning with HDInsight
 
Architecting the Future of Big Data and Search
Architecting the Future of Big Data and SearchArchitecting the Future of Big Data and Search
Architecting the Future of Big Data and Search
 
Cloud Austin Meetup - Hadoop like a champion
Cloud Austin Meetup - Hadoop like a championCloud Austin Meetup - Hadoop like a champion
Cloud Austin Meetup - Hadoop like a champion
 
Tools and techniques for data science
Tools and techniques for data scienceTools and techniques for data science
Tools and techniques for data science
 
Big Data , Big Problem?
Big Data , Big Problem?Big Data , Big Problem?
Big Data , Big Problem?
 
Présentation on radoop
Présentation on radoop   Présentation on radoop
Présentation on radoop
 
Hadoop Big Data A big picture
Hadoop Big Data A big pictureHadoop Big Data A big picture
Hadoop Big Data A big picture
 
Analysis of historical movie data by BHADRA
Analysis of historical movie data by BHADRAAnalysis of historical movie data by BHADRA
Analysis of historical movie data by BHADRA
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
 
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder HortonworksThe Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks
 
EclipseCon Keynote: Apache Hadoop - An Introduction
EclipseCon Keynote: Apache Hadoop - An IntroductionEclipseCon Keynote: Apache Hadoop - An Introduction
EclipseCon Keynote: Apache Hadoop - An Introduction
 
Azure Cafe Marketplace with Hortonworks March 31 2016
Azure Cafe Marketplace with Hortonworks March 31 2016Azure Cafe Marketplace with Hortonworks March 31 2016
Azure Cafe Marketplace with Hortonworks March 31 2016
 
What is hadoop
What is hadoopWhat is hadoop
What is hadoop
 
Overview of big data & hadoop version 1 - Tony Nguyen
Overview of big data & hadoop   version 1 - Tony NguyenOverview of big data & hadoop   version 1 - Tony Nguyen
Overview of big data & hadoop version 1 - Tony Nguyen
 
Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1
 
Hadoop Platforms - Introduction, Importance, Providers
Hadoop Platforms - Introduction, Importance, ProvidersHadoop Platforms - Introduction, Importance, Providers
Hadoop Platforms - Introduction, Importance, Providers
 
THE SOLUTION FOR BIG DATA
THE SOLUTION FOR BIG DATATHE SOLUTION FOR BIG DATA
THE SOLUTION FOR BIG DATA
 
THE SOLUTION FOR BIG DATA
THE SOLUTION FOR BIG DATATHE SOLUTION FOR BIG DATA
THE SOLUTION FOR BIG DATA
 

Recently uploaded

Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramMoniSankarHazra
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangaloreamitlee9823
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteedamy56318795
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsJoseMangaJr1
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Pooja Nehwal
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...amitlee9823
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...amitlee9823
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...amitlee9823
 

Recently uploaded (20)

Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
 

Hadoop and IoT Sinergija 2014

  • 1.
  • 2. Hadoop and IoT Darko Marjanović Đorđe Stepanić Miloš Milovanović
  • 3. AGENDA BIG DATA HADOOP AND IOT MODEL HADOOP IOT HADOOP DATA PROCESSING HIVE STINGER INITIATIVE Q&A
  • 4. BIG DATA Big Data describes the collection of complex and large data sets such that it’s difficult to capture, process, store, search and analyze using conventional data base systems. Anything that Won't Fit in Excel. *Definition taken from (www.bigdata-startups.com)
  • 5. BIG DATA DIMESIONS 1992 100GB/Day 2002 100GB/Second 2013 28,000GB/Second 2018 50,000GB/Second
  • 7. HADOOP Apache Hadoop is an open-source software framework for storage and large-scale processing of data-sets on clusters of commodity hardware. Hadoop was created by Doug Cutting and Mike Cafarella in 2005 All the modules in Hadoop are designed with a fundamental assumption that hardware failures are common and thus should be automatically handled in software by the framework.
  • 8. HADOOP COMPONENTS Hadoop common HDFS Map Reduce YARN (Starting with Hadoop 2.x.x)
  • 9. HADOOP HDFS The Hadoop distributed file system (HDFS) is a distributed, scalable, and portable file-system written in Java for the Hadoop framework.
  • 10. HADOOP MAP REDUCE Map Reduce is a programming model and an associated implementation for processing and generating large data sets with a parallel, distributed algorithm on a cluster.
  • 11. HADOOP YARN Apache Hadoop YARN (Yet Another Resource Negotiator) is a cluster management technology. YARN is now characterized as a large-scale, distributed operating system for big data applications.
  • 12. HADOOP ECOSYSTEM The main groups of tools in the Hadoop ecosystem: Data Ingestion (Flume, Sqoop …) Data Processing (Pig, Hive, Storm …) Cluster Management(Ambari) Security (Knox)
  • 13. DATA INGESTION Flume Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of streaming event data. Sqoop Apache Sqoop is a tool designed for efficiently transferring bulk data between Apache Hadoop and structured datastores such as relational databases. WEB HDFS REST API
  • 15.
  • 16. SQOOP AND WEB HDFS API EXAMPLE
  • 17. IOT
  • 18. UBIQUITOUS COMPUTING & INTERNET OF THINGS Ubiquitous computing - trend (wave) in computing where computers are spreaded throughout our everyday environment. Concept: one person - many computers Internet Of Things - is the network of physical objects accessed through the Internet, which contains embedded technology to interact (sense and communicate) with internal states or the external environment (Cisco definition).
  • 19. INTERNET OF THINGS COMPONENTS
  • 20. INTERNET OF THINGS AND BIG DATA
  • 21. REAL-TIME DATA, STRUCTURED AND UNSTRUCTURED DATA GENERATED FROM INTERNET OF THINGS
  • 22. INTERNET OF THINGS - FIELDS OF APPLICATION * Production - energy savings, lower maintenance costs, prediction of machine failure, quality control etc. ** Logistic - efficient supply control , optimization of transport, environmental controls in the warehouse, JIT, lean logistics, better capacity utilization etc. Smart cities & environment - smart parking, traffic congestion, smart lighting, waste management, noise urban maps, air pollution etc. Smart agriculture eHealth and everything you can imagine...
  • 23. HADOOP DATA PROCESSING Input: - Raw data files - No metadata - No schema Objective: - Perform analysis, run interactive queries - Explore, structure and analyze the data - Real-time processing (Apache Storm) - Visualization
  • 24. HIVE Apache Hive is a data warehousing software that facilitates querying and managing large datasets residing in distributed storage. Hive provides: - Tools ETL processes - A mechanism for imposing a structure on a variety of data formats - Access to files stored in HDFS or other storage systems - Query execution via MapReduce?
  • 25. HIVE ARCHITECTURE Data Model: - Tables - Partitions - Buckets SERDEs Datatypes: Common primitive data types (int, boolean, float, double, string, char, date, timestamp, …) +Complex data types (structs, maps, arrays) UI Driver Compiler Metastore Execution engine
  • 26. HIVE.NOW Hive defines a simple SQL-like query language, called HQL, that enables users familiar with SQL to query the data. Scalable and extensible. Most commonly used for: - Log analysis - Statistical analysis - Document indexing
  • 28. STINGER INITIATIVE Stinger is the initiative to improve query execution time and increase SQL functionality for Apache Hive. Microsoft and Hortonworks worked actively in the Apache community towards completing Stinger. Announced in February 2013 44 companies, 145 developers, 392,000 lines of Java code Hive 0.13 Speed: Hive on Tez, vectorized query engine & cost-based optimizer Scale: dynamic partition loads and smaller hash tables SQL: CHAR & DECIMAL datatypes, subqueries for IN / NOT IN Improved Hive performance up to 100x.
  • 29. STINGER.NEXT Stinger.next is a continuation of Stinger initiative to further speed, scale and SQL in Hive in the open Apache Hive community. Main goals: - transactions with ACID semantics - sub-second queries - SQL:2011 Analytics - usability improvements To be delivered in next 18 months.
  • 30. HIVE ON SPARK Apache Spark is a fast and general engine for large-scale data processing. Spark powers a stack of high-level tools including Spark SQL, MLlib for machine learning, GraphX, and Spark Streaming. Hive-Spark Machine Learning Integration will allow Hive users to run machine learning models via Hive.
  • 31. STINGER.NEXT *Photo taken from the official Hortonworks website (www.hortonworks.com)
  • 32. Q&A darko@thingsolver.com djordje@thingsolver.com milosmilovanovic@outlook.com hadoop-srbija.com
  • 33. Please rate this lecture and win Windows Phone NOKIA Lumia 1320 Help us choose the best Sinergija lecturer! Microsoft will award you – at the conference end, we’ll give one NOKIA Lumia 1320 to someone from the audience – randomly. Go to www.mssinergija.net, log in and cast your votes! You can rate only lectures that you were present at, just once. More lectures you rate, more chances you have. Winner will be announced at the official Sinergija web portal, www.mssinergija.net

Editor's Notes

  1. Agenda
  2. Microsoft and Hortonworks have a shared vision of open innovation in and around Apache Hadoop and a commitment to deliver that via a 100% open source platform.