SlideShare a Scribd company logo
1 of 53
N o v 2 0 1 4 – B i g d a t a 2 Of 53 
2 
What is Big Data ? 
* Data so large and complex that it becomes difficult to 
process with traditional systems 
* First time coined in 1997, NASA report 
* Petabytes and Exabytes of data
N o v 2 0 1 4 – B i g d a t a 3 Of 53 
3 
Big data is everywhere 
* Every 2 days we create as much information as we did from the beginning of time 
until 2003 
* Google processes over 40 thousand search queries per second, making it 
over 3.5 billion in a single day. 
* Around 100 hours of video are uploaded to YouTube every minute and it 
would take you around 15 years to watch every video uploaded by users in one 
day 
* Every minute we send 204 million emails, generate 1,8 million Facebook 
likes, send 278 thousand Tweets, and upload 200,000 photos to Facebook 
* Trillions of sensors monitor, track, communicate with each other , populating 
the IoT with realtime data
N o v 2 0 1 4 – B i g d a t a 4 Of 53 
4 
Big data is not new
N o v 2 0 1 4 – B i g d a t a 5 Of 53 
5 
Characteristics
N o v 2 0 1 4 – B i g d a t a 6 Of 53 
6 
Volume 
* More data beats == better model 
* Scalable storage, and distributed approach to querying
N o v 2 0 1 4 – B i g d a t a 7 Of 53 
7 
Variety 
* Big data includes all data 
* Data no longer fits into neatly structured tables
N o v 2 0 1 4 – B i g d a t a 8 Of 53 
8 
Velocity 
* Frequency at which data is generated, captured , stored and processed 
* Need for real-time processing
N o v 2 0 1 4 – B i g d a t a 9 Of 53 
9 
Data sources
N o v 2 0 1 4 – B i g d a t a 10 Of 53 
10 
Importance of Big Data 
* Media 
* Retailing 
* Public service 
* Health 
* Industry
N o v 2 0 1 4 – B i g d a t a 11 Of 53 
11 
Importance of Big Data 
* Gaining a more complete understanding of 
business 
customers 
products 
competitors 
* Which can lead to 
efficiency improvements 
increased sales 
lower costs 
better customer service 
improved products
N o v 2 0 1 4 – B i g d a t a 12 Of 53 
12 
The problem 
* Overall information available 
10% structured data 
used in decision making 
90% unstructured data 
wasted, not captured or analyzed 
* Valuable information VS data which is best left ignored 
* 37.5% of large organizations said that analyzing big data is their biggest challenge 
* More that 90% said that Big Data is a top ten priority
N o v 2 0 1 4 – B i g d a t a 13 Of 53 
13 
It’s not the only the size 
* Collect -> Analyze -> Understand -> Generate Value 
* Find a meaning 
* Find interconnexions 
* Find hidden data
N o v 2 0 1 4 – B i g d a t a 14 Of 53 
14 
Purpose 
* Take more precise actions that brings value and reduce costs 
* Make the right decision within the right amount of time
N o v 2 0 1 4 – B i g d a t a 15 Of 53 
15 
How big will big data get? 
* 3.2 zettabytes today to 40 zettabytes in only six years. 
* More than 30 billion devices will be wirelessly connected by 2020.
N o v 2 0 1 4 – B i g d a t a 16 Of 53 
16 
Challenges 
* Storing data 
* Analysis 
* Search 
* Sharing 
* Transfer 
* Visualization
N o v 2 0 1 4 – B i g d a t a 17 Of 53 
17 
NoSQL and Big Data Analytics 
* Storing data 
* Distribution 
* Processing
N o v 2 0 1 4 – B i g d a t a 18 Of 53 
18 
NoSQL 
* Scalability/ cluster friendly 
* Availability/ fault tolerance 
* Schema-less 
* Low latency 
* High performance 
* Open-source
N o v 2 0 1 4 – B i g d a t a 19 Of 53 
19 
Dynamic scaling 
* adding/removing nodes dynamically 
→ storage/performance capacity can grow or shrink as needed
N o v 2 0 1 4 – B i g d a t a 20 Of 53 
20 
Auto-sharding 
* Natively and automatically spread data across servers 
* Data and query load automatically balanced across servers
N o v 2 0 1 4 – B i g d a t a 21 Of 53 
21 
Replication 
* Support automatic replication 
→ high availability 
→ disaster recovery 
→ no need for separate applications to manage these tasks
N o v 2 0 1 4 – B i g d a t a 22 Of 53 
22 
Schemaless 
* No predefined schema 
* Insertion of aggregates 
→ puts together data that is commonly accessed together
N o v 2 0 1 4 – B i g d a t a 23 Of 53 
23 
NoSQL vanillas
N o v 2 0 1 4 – B i g d a t a 24 Of 53 
24 
NoSQL vanillas 
* Key-value store 
→ Amazon DynamoDB, Redis 
→ Content caching (focus on scaling to huge amounts of data, designed to handle 
massive load), logging, etc 
* Document store 
→ CouchDB, MongoDb 
→ Web applications 
* Column family store 
→ Cassandra, HBase 
→ Distributed file systems 
* Graph store 
→ Neo4J, InfoGrid, Infinite Graph 
→ Social networking, Recommendations (Focus on modeling the structure of data – 
interconnectivity)
N o v 2 0 1 4 – B i g d a t a 25 Of 53 
25 
Reasons for choosing NoSQL 
* Working on large amount of data 
* Scaling out with ease 
* Need of: 
→ high-availability 
→ low-latency systems with eventual consistency 
* Model fits aggregate: 
→ as a natural choice 
→ structure is changing with time
N o v 2 0 1 4 – B i g d a t a 26 Of 53 
26 
… and associates
N o v 2 0 1 4 – B i g d a t a 27 Of 53 
27 
What is hadoop? 
● Distributed file system 
● Distributed processing system 
● Batch / offline oriented 
● Open source
N o v 2 0 1 4 – B i g d a t a 28 Of 53 
28 
In the beginning... 
● Created by Doug Cutting and Mike Cafarella 
● Inteded as a distribution support for 
● Built based on Google's MapReduce and Google File System 
● papers
N o v 2 0 1 4 – B i g d a t a 29 Of 53 
29 
Who uses Hadoop? 
Most notable users are … 
+ many others
N o v 2 0 1 4 – B i g d a t a 30 Of 53 
30 
Hadoop in the real world 
● Recommandation system 
● Data warehousing 
● Financial analysis 
● Market research/forecasting 
● Log analysis 
● Threat analysis 
● Image processing 
● Social networking 
● Advertising
N o v 2 0 1 4 – B i g d a t a 31 Of 53 
31 
Why Hadoop? 
● Scalable 
● Cost effective 
● Flexible 
● Efficient 
● Resilient to failure 
● Schema on read
N o v 2 0 1 4 – B i g d a t a 32 Of 53 
32 
Why not Hadoop? 
● Inefficient when used at small scale 
● Not good for real time systems
N o v 2 0 1 4 – B i g d a t a 33 Of 53 
33 
Hadoop major components 
● Hadoop commons 
● YARN 
● HDFS 
● Map/Reduce
N o v 2 0 1 4 – B i g d a t a 34 Of 53 
34 
Arhitecture
N o v 2 0 1 4 – B i g d a t a 35 Of 53 
35 
Arhitecture
N o v 2 0 1 4 – B i g d a t a 36 Of 53 
36 
Arhitecture
N o v 2 0 1 4 – B i g d a t a 37 Of 53 
37 
Arhitecture
N o v 2 0 1 4 – B i g d a t a 38 Of 53 
38 
Arhitecture
N o v 2 0 1 4 – B i g d a t a 39 Of 53 
39 
MapReduce 
● Split input files 
● Operate on key/value 
● Mappers filter 
& transform input data 
● Reducers aggregate 
mappers output 
● Move code to data
N o v 2 0 1 4 – B i g d a t a 40 Of 53 
40
N o v 2 0 1 4 – B i g d a t a 41 Of 53 
41 
… and associates
N o v 2 0 1 4 – B i g d a t a 42 Of 53 
42 
Apache Ambari 
The project is aimed at making Hadoop management simpler 
by developing software for provisioning, managing, 
and monitoring Apache Hadoop clusters
N o v 2 0 1 4 – B i g d a t a 43 Of 53 
43 
Apache Pig 
Apache Pig is a platform for analyzing large data sets that consists of a high-level 
language for expressing data analysis programs, coupled with infrastructure 
for evaluating these programs
N o v 2 0 1 4 – B i g d a t a 44 Of 53 
44 
Apache Hive 
The Apache Hive ™ data warehouse software facilitates querying and 
managing large datasets residing in distributed storage. Hive provides a mechanism 
to project structure onto this data and query the data using a SQL-like language called HiveQL
N o v 2 0 1 4 – B i g d a t a 45 Of 53 
45 
Apache Chukwa 
It is a data collection system for monitoring large distributed systems. 
Chukwa comes with a flexible and powerful toolkit for displaying, monitoring and analyzing 
results to make the best use of the collected data.
N o v 2 0 1 4 – B i g d a t a 46 Of 53 
46 
Apache Avro 
A remote procedure call and data serialization framework
N o v 2 0 1 4 – B i g d a t a 47 Of 53 
47 
Apache Hbase 
Apache Hbase offers random, realtime read/write access to your Big Data. 
This project's goal is the hosting of very large tables 
-- billions of rows X millions of columns -- atop clusters of commodity hardware
N o v 2 0 1 4 – B i g d a t a 48 Of 53 
48 
Apache Mahout 
The Apache Mahout™ project's goal is to build a scalable machine learning library
N o v 2 0 1 4 – B i g d a t a 49 Of 53 
49 
Apache Spark 
Apache Spark™ is a fast and general engine for large-scale data processing
N o v 2 0 1 4 – B i g d a t a 50 Of 53 
50 
Apache Zookeeper 
ZooKeeper is a centralized service for maintaining configuration 
information, naming, providing distributed synchronization, and providing group services.
N o v 2 0 1 4 – B i g d a t a 51 Of 53 
51 
Big data – in the future 
● 87% of enterprises believe Big Data analytics will redefine the competitive landscape 
of their industries within the next three years 
● 89% believe that companies that do not adopt a Big Data analytics strategy in the next 
year risk losing market share and momentum.
N o v 2 0 1 4 – B i g d a t a 52 Of 53 
52 
Big data – in the future
Va multumim!

More Related Content

What's hot

20131011 - Los Gatos - Netflix - Big Data Design Patterns
20131011 - Los Gatos - Netflix - Big Data Design Patterns20131011 - Los Gatos - Netflix - Big Data Design Patterns
20131011 - Los Gatos - Netflix - Big Data Design Patterns
Allen Day, PhD
 
Big data today and tomorrow
Big data today and tomorrowBig data today and tomorrow
Big data today and tomorrow
magda3695
 
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...
Simplilearn
 

What's hot (20)

Bigdata
Bigdata Bigdata
Bigdata
 
Introduction to Big Data Hadoop Training Online by www.itjobzone.biz
Introduction to Big Data Hadoop Training Online by www.itjobzone.bizIntroduction to Big Data Hadoop Training Online by www.itjobzone.biz
Introduction to Big Data Hadoop Training Online by www.itjobzone.biz
 
Big Data
Big DataBig Data
Big Data
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
BigData HUB Workshop
BigData HUB WorkshopBigData HUB Workshop
BigData HUB Workshop
 
Data analytics using the cloud challenges and opportunities for india
Data analytics using the cloud   challenges and opportunities for india Data analytics using the cloud   challenges and opportunities for india
Data analytics using the cloud challenges and opportunities for india
 
A Comprehensive Study on Big Data Applications and Challenges
A Comprehensive Study on Big Data Applications and ChallengesA Comprehensive Study on Big Data Applications and Challenges
A Comprehensive Study on Big Data Applications and Challenges
 
20131011 - Los Gatos - Netflix - Big Data Design Patterns
20131011 - Los Gatos - Netflix - Big Data Design Patterns20131011 - Los Gatos - Netflix - Big Data Design Patterns
20131011 - Los Gatos - Netflix - Big Data Design Patterns
 
Building an Enterprise-Scale Dashboarding/Analytics Platform Powered by the C...
Building an Enterprise-Scale Dashboarding/Analytics Platform Powered by the C...Building an Enterprise-Scale Dashboarding/Analytics Platform Powered by the C...
Building an Enterprise-Scale Dashboarding/Analytics Platform Powered by the C...
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Hadoop Training Tutorial for Freshers
Hadoop Training Tutorial for FreshersHadoop Training Tutorial for Freshers
Hadoop Training Tutorial for Freshers
 
Big data today and tomorrow
Big data today and tomorrowBig data today and tomorrow
Big data today and tomorrow
 
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...
 
Future of big data nick kabra speaker compendium march 2013
Future of big data nick kabra speaker compendium march 2013Future of big data nick kabra speaker compendium march 2013
Future of big data nick kabra speaker compendium march 2013
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Seminar Report Vaibhav
Seminar Report VaibhavSeminar Report Vaibhav
Seminar Report Vaibhav
 
Big Data Analytics - Introduction
Big Data Analytics - IntroductionBig Data Analytics - Introduction
Big Data Analytics - Introduction
 
Guest Lecture: Introduction to Big Data at Indian Institute of Technology
Guest Lecture: Introduction to Big Data at Indian Institute of TechnologyGuest Lecture: Introduction to Big Data at Indian Institute of Technology
Guest Lecture: Introduction to Big Data at Indian Institute of Technology
 
Analysis of big data in pandemic case
Analysis of big data in pandemic case Analysis of big data in pandemic case
Analysis of big data in pandemic case
 
Big Data & Data Science
Big Data & Data ScienceBig Data & Data Science
Big Data & Data Science
 

Viewers also liked

"Святий Миколай - заступник скривджених"
"Святий Миколай - заступник скривджених""Святий Миколай - заступник скривджених"
"Святий Миколай - заступник скривджених"
2408868
 
Конкурс професійної майстерності на створення логотипу бібліотеки
Конкурс  професійної  майстерності на створення логотипу бібліотекиКонкурс  професійної  майстерності на створення логотипу бібліотеки
Конкурс професійної майстерності на створення логотипу бібліотеки
2408868
 
"Мир - це не все, але все без миру ніщо"
"Мир - це не все, але все без миру ніщо""Мир - це не все, але все без миру ніщо"
"Мир - це не все, але все без миру ніщо"
2408868
 
Загальносистемний конкурс «Я стверджуюсь! Я утверждаюсь!»
Загальносистемний конкурс «Я стверджуюсь! Я утверждаюсь!» Загальносистемний конкурс «Я стверджуюсь! Я утверждаюсь!»
Загальносистемний конкурс «Я стверджуюсь! Я утверждаюсь!»
2408868
 
"Річниця вшанування пам'яті жертв трагічних подій на Майдані Незалежності в м...
"Річниця вшанування пам'яті жертв трагічних подій на Майдані Незалежності в м..."Річниця вшанування пам'яті жертв трагічних подій на Майдані Незалежності в м...
"Річниця вшанування пам'яті жертв трагічних подій на Майдані Незалежності в м...
2408868
 
"Річниця вшанування пам'яті жертв трагічних подій на Майдані Незалежності в м...
"Річниця вшанування пам'яті жертв трагічних подій на Майдані Незалежності в м..."Річниця вшанування пам'яті жертв трагічних подій на Майдані Незалежності в м...
"Річниця вшанування пам'яті жертв трагічних подій на Майдані Незалежності в м...
2408868
 
"Була весна, весна Перемоги"
"Була весна, весна Перемоги""Була весна, весна Перемоги"
"Була весна, весна Перемоги"
2408868
 
"Гармонія душі"
"Гармонія душі""Гармонія душі"
"Гармонія душі"
2408868
 
Толока "Посади дерево миру"
Толока "Посади дерево миру"Толока "Посади дерево миру"
Толока "Посади дерево миру"
2408868
 
Публічна бібліотека №13 ЦБС Солом'янського району міста Києва
Публічна бібліотека №13 ЦБС Солом'янського району міста КиєваПублічна бібліотека №13 ЦБС Солом'янського району міста Києва
Публічна бібліотека №13 ЦБС Солом'янського району міста Києва
2408868
 

Viewers also liked (14)

"Святий Миколай - заступник скривджених"
"Святий Миколай - заступник скривджених""Святий Миколай - заступник скривджених"
"Святий Миколай - заступник скривджених"
 
"Німіє слово і мовчать уста"
"Німіє слово і мовчать уста""Німіє слово і мовчать уста"
"Німіє слово і мовчать уста"
 
Projeto abnt
Projeto abntProjeto abnt
Projeto abnt
 
Конкурс професійної майстерності на створення логотипу бібліотеки
Конкурс  професійної  майстерності на створення логотипу бібліотекиКонкурс  професійної  майстерності на створення логотипу бібліотеки
Конкурс професійної майстерності на створення логотипу бібліотеки
 
Living and Working as a Digital Nomad
Living and Working as a Digital NomadLiving and Working as a Digital Nomad
Living and Working as a Digital Nomad
 
"Мир - це не все, але все без миру ніщо"
"Мир - це не все, але все без миру ніщо""Мир - це не все, але все без миру ніщо"
"Мир - це не все, але все без миру ніщо"
 
Загальносистемний конкурс «Я стверджуюсь! Я утверждаюсь!»
Загальносистемний конкурс «Я стверджуюсь! Я утверждаюсь!» Загальносистемний конкурс «Я стверджуюсь! Я утверждаюсь!»
Загальносистемний конкурс «Я стверджуюсь! Я утверждаюсь!»
 
"Річниця вшанування пам'яті жертв трагічних подій на Майдані Незалежності в м...
"Річниця вшанування пам'яті жертв трагічних подій на Майдані Незалежності в м..."Річниця вшанування пам'яті жертв трагічних подій на Майдані Незалежності в м...
"Річниця вшанування пам'яті жертв трагічних подій на Майдані Незалежності в м...
 
"Річниця вшанування пам'яті жертв трагічних подій на Майдані Незалежності в м...
"Річниця вшанування пам'яті жертв трагічних подій на Майдані Незалежності в м..."Річниця вшанування пам'яті жертв трагічних подій на Майдані Незалежності в м...
"Річниця вшанування пам'яті жертв трагічних подій на Майдані Незалежності в м...
 
"Була весна, весна Перемоги"
"Була весна, весна Перемоги""Була весна, весна Перемоги"
"Була весна, весна Перемоги"
 
"Гармонія душі"
"Гармонія душі""Гармонія душі"
"Гармонія душі"
 
All About Modafinil
All About ModafinilAll About Modafinil
All About Modafinil
 
Толока "Посади дерево миру"
Толока "Посади дерево миру"Толока "Посади дерево миру"
Толока "Посади дерево миру"
 
Публічна бібліотека №13 ЦБС Солом'янського району міста Києва
Публічна бібліотека №13 ЦБС Солом'янського району міста КиєваПублічна бібліотека №13 ЦБС Солом'янського району міста Києва
Публічна бібліотека №13 ЦБС Солом'янського району міста Києва
 

Similar to Prezentare: Big Data demistificat

Big data hadoop-no sql and graph db-final
Big data hadoop-no sql and graph db-finalBig data hadoop-no sql and graph db-final
Big data hadoop-no sql and graph db-final
ramazan fırın
 

Similar to Prezentare: Big Data demistificat (20)

Hybrid Transactional/Analytics Processing with Spark and IMDGs
Hybrid Transactional/Analytics Processing with Spark and IMDGsHybrid Transactional/Analytics Processing with Spark and IMDGs
Hybrid Transactional/Analytics Processing with Spark and IMDGs
 
Introduction to Cloud computing and Big Data-Hadoop
Introduction to Cloud computing and  Big Data-HadoopIntroduction to Cloud computing and  Big Data-Hadoop
Introduction to Cloud computing and Big Data-Hadoop
 
To mesh or mess up your data organisation - Jochem van Grondelle (Prosus/OLX ...
To mesh or mess up your data organisation - Jochem van Grondelle (Prosus/OLX ...To mesh or mess up your data organisation - Jochem van Grondelle (Prosus/OLX ...
To mesh or mess up your data organisation - Jochem van Grondelle (Prosus/OLX ...
 
Big Data: an introduction
Big Data: an introductionBig Data: an introduction
Big Data: an introduction
 
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...
 
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
 
Data Engineer's Lunch #85: Designing a Modern Data Stack
Data Engineer's Lunch #85: Designing a Modern Data StackData Engineer's Lunch #85: Designing a Modern Data Stack
Data Engineer's Lunch #85: Designing a Modern Data Stack
 
BigData Hadoop
BigData Hadoop BigData Hadoop
BigData Hadoop
 
Big Data: hype or necessity?
Big Data: hype or necessity?Big Data: hype or necessity?
Big Data: hype or necessity?
 
Big data nyu
Big data nyuBig data nyu
Big data nyu
 
Introduction to Big Data & Big Data 1.0 System
Introduction to Big Data & Big Data 1.0 SystemIntroduction to Big Data & Big Data 1.0 System
Introduction to Big Data & Big Data 1.0 System
 
IRJET- Systematic Review: Progression Study on BIG DATA articles
IRJET- Systematic Review: Progression Study on BIG DATA articlesIRJET- Systematic Review: Progression Study on BIG DATA articles
IRJET- Systematic Review: Progression Study on BIG DATA articles
 
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...
 
Big data4businessusers
Big data4businessusersBig data4businessusers
Big data4businessusers
 
Big data hadoop-no sql and graph db-final
Big data hadoop-no sql and graph db-finalBig data hadoop-no sql and graph db-final
Big data hadoop-no sql and graph db-final
 
Big Data in Action : Operations, Analytics and more
Big Data in Action : Operations, Analytics and moreBig Data in Action : Operations, Analytics and more
Big Data in Action : Operations, Analytics and more
 
Data Virtualization: An Introduction
Data Virtualization: An IntroductionData Virtualization: An Introduction
Data Virtualization: An Introduction
 
Big data technologies and Hadoop infrastructure
Big data technologies and Hadoop infrastructureBig data technologies and Hadoop infrastructure
Big data technologies and Hadoop infrastructure
 
Big data
Big dataBig data
Big data
 
Big Data: hype or necessity?
Big Data: hype or necessity?Big Data: hype or necessity?
Big Data: hype or necessity?
 

Recently uploaded

Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills KuwaitKuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
jaanualu31
 
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments""Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
mphochane1998
 
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Kandungan 087776558899
 
DeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesDeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakes
MayuraD1
 

Recently uploaded (20)

Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
 
Work-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxWork-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptx
 
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
 
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills KuwaitKuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torque
 
Computer Networks Basics of Network Devices
Computer Networks  Basics of Network DevicesComputer Networks  Basics of Network Devices
Computer Networks Basics of Network Devices
 
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments""Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
 
Moment Distribution Method For Btech Civil
Moment Distribution Method For Btech CivilMoment Distribution Method For Btech Civil
Moment Distribution Method For Btech Civil
 
GEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLE
GEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLEGEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLE
GEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLE
 
Introduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaIntroduction to Serverless with AWS Lambda
Introduction to Serverless with AWS Lambda
 
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKARHAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
 
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
 
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptx
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptxOrlando’s Arnold Palmer Hospital Layout Strategy-1.pptx
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptx
 
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptxHOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
 
A Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityA Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna Municipality
 
DeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesDeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakes
 
PE 459 LECTURE 2- natural gas basic concepts and properties
PE 459 LECTURE 2- natural gas basic concepts and propertiesPE 459 LECTURE 2- natural gas basic concepts and properties
PE 459 LECTURE 2- natural gas basic concepts and properties
 
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptxS1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
 
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptxA CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
 
Engineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planesEngineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planes
 

Prezentare: Big Data demistificat

  • 1.
  • 2. N o v 2 0 1 4 – B i g d a t a 2 Of 53 2 What is Big Data ? * Data so large and complex that it becomes difficult to process with traditional systems * First time coined in 1997, NASA report * Petabytes and Exabytes of data
  • 3. N o v 2 0 1 4 – B i g d a t a 3 Of 53 3 Big data is everywhere * Every 2 days we create as much information as we did from the beginning of time until 2003 * Google processes over 40 thousand search queries per second, making it over 3.5 billion in a single day. * Around 100 hours of video are uploaded to YouTube every minute and it would take you around 15 years to watch every video uploaded by users in one day * Every minute we send 204 million emails, generate 1,8 million Facebook likes, send 278 thousand Tweets, and upload 200,000 photos to Facebook * Trillions of sensors monitor, track, communicate with each other , populating the IoT with realtime data
  • 4. N o v 2 0 1 4 – B i g d a t a 4 Of 53 4 Big data is not new
  • 5. N o v 2 0 1 4 – B i g d a t a 5 Of 53 5 Characteristics
  • 6. N o v 2 0 1 4 – B i g d a t a 6 Of 53 6 Volume * More data beats == better model * Scalable storage, and distributed approach to querying
  • 7. N o v 2 0 1 4 – B i g d a t a 7 Of 53 7 Variety * Big data includes all data * Data no longer fits into neatly structured tables
  • 8. N o v 2 0 1 4 – B i g d a t a 8 Of 53 8 Velocity * Frequency at which data is generated, captured , stored and processed * Need for real-time processing
  • 9. N o v 2 0 1 4 – B i g d a t a 9 Of 53 9 Data sources
  • 10. N o v 2 0 1 4 – B i g d a t a 10 Of 53 10 Importance of Big Data * Media * Retailing * Public service * Health * Industry
  • 11. N o v 2 0 1 4 – B i g d a t a 11 Of 53 11 Importance of Big Data * Gaining a more complete understanding of business customers products competitors * Which can lead to efficiency improvements increased sales lower costs better customer service improved products
  • 12. N o v 2 0 1 4 – B i g d a t a 12 Of 53 12 The problem * Overall information available 10% structured data used in decision making 90% unstructured data wasted, not captured or analyzed * Valuable information VS data which is best left ignored * 37.5% of large organizations said that analyzing big data is their biggest challenge * More that 90% said that Big Data is a top ten priority
  • 13. N o v 2 0 1 4 – B i g d a t a 13 Of 53 13 It’s not the only the size * Collect -> Analyze -> Understand -> Generate Value * Find a meaning * Find interconnexions * Find hidden data
  • 14. N o v 2 0 1 4 – B i g d a t a 14 Of 53 14 Purpose * Take more precise actions that brings value and reduce costs * Make the right decision within the right amount of time
  • 15. N o v 2 0 1 4 – B i g d a t a 15 Of 53 15 How big will big data get? * 3.2 zettabytes today to 40 zettabytes in only six years. * More than 30 billion devices will be wirelessly connected by 2020.
  • 16. N o v 2 0 1 4 – B i g d a t a 16 Of 53 16 Challenges * Storing data * Analysis * Search * Sharing * Transfer * Visualization
  • 17. N o v 2 0 1 4 – B i g d a t a 17 Of 53 17 NoSQL and Big Data Analytics * Storing data * Distribution * Processing
  • 18. N o v 2 0 1 4 – B i g d a t a 18 Of 53 18 NoSQL * Scalability/ cluster friendly * Availability/ fault tolerance * Schema-less * Low latency * High performance * Open-source
  • 19. N o v 2 0 1 4 – B i g d a t a 19 Of 53 19 Dynamic scaling * adding/removing nodes dynamically → storage/performance capacity can grow or shrink as needed
  • 20. N o v 2 0 1 4 – B i g d a t a 20 Of 53 20 Auto-sharding * Natively and automatically spread data across servers * Data and query load automatically balanced across servers
  • 21. N o v 2 0 1 4 – B i g d a t a 21 Of 53 21 Replication * Support automatic replication → high availability → disaster recovery → no need for separate applications to manage these tasks
  • 22. N o v 2 0 1 4 – B i g d a t a 22 Of 53 22 Schemaless * No predefined schema * Insertion of aggregates → puts together data that is commonly accessed together
  • 23. N o v 2 0 1 4 – B i g d a t a 23 Of 53 23 NoSQL vanillas
  • 24. N o v 2 0 1 4 – B i g d a t a 24 Of 53 24 NoSQL vanillas * Key-value store → Amazon DynamoDB, Redis → Content caching (focus on scaling to huge amounts of data, designed to handle massive load), logging, etc * Document store → CouchDB, MongoDb → Web applications * Column family store → Cassandra, HBase → Distributed file systems * Graph store → Neo4J, InfoGrid, Infinite Graph → Social networking, Recommendations (Focus on modeling the structure of data – interconnectivity)
  • 25. N o v 2 0 1 4 – B i g d a t a 25 Of 53 25 Reasons for choosing NoSQL * Working on large amount of data * Scaling out with ease * Need of: → high-availability → low-latency systems with eventual consistency * Model fits aggregate: → as a natural choice → structure is changing with time
  • 26. N o v 2 0 1 4 – B i g d a t a 26 Of 53 26 … and associates
  • 27. N o v 2 0 1 4 – B i g d a t a 27 Of 53 27 What is hadoop? ● Distributed file system ● Distributed processing system ● Batch / offline oriented ● Open source
  • 28. N o v 2 0 1 4 – B i g d a t a 28 Of 53 28 In the beginning... ● Created by Doug Cutting and Mike Cafarella ● Inteded as a distribution support for ● Built based on Google's MapReduce and Google File System ● papers
  • 29. N o v 2 0 1 4 – B i g d a t a 29 Of 53 29 Who uses Hadoop? Most notable users are … + many others
  • 30. N o v 2 0 1 4 – B i g d a t a 30 Of 53 30 Hadoop in the real world ● Recommandation system ● Data warehousing ● Financial analysis ● Market research/forecasting ● Log analysis ● Threat analysis ● Image processing ● Social networking ● Advertising
  • 31. N o v 2 0 1 4 – B i g d a t a 31 Of 53 31 Why Hadoop? ● Scalable ● Cost effective ● Flexible ● Efficient ● Resilient to failure ● Schema on read
  • 32. N o v 2 0 1 4 – B i g d a t a 32 Of 53 32 Why not Hadoop? ● Inefficient when used at small scale ● Not good for real time systems
  • 33. N o v 2 0 1 4 – B i g d a t a 33 Of 53 33 Hadoop major components ● Hadoop commons ● YARN ● HDFS ● Map/Reduce
  • 34. N o v 2 0 1 4 – B i g d a t a 34 Of 53 34 Arhitecture
  • 35. N o v 2 0 1 4 – B i g d a t a 35 Of 53 35 Arhitecture
  • 36. N o v 2 0 1 4 – B i g d a t a 36 Of 53 36 Arhitecture
  • 37. N o v 2 0 1 4 – B i g d a t a 37 Of 53 37 Arhitecture
  • 38. N o v 2 0 1 4 – B i g d a t a 38 Of 53 38 Arhitecture
  • 39. N o v 2 0 1 4 – B i g d a t a 39 Of 53 39 MapReduce ● Split input files ● Operate on key/value ● Mappers filter & transform input data ● Reducers aggregate mappers output ● Move code to data
  • 40. N o v 2 0 1 4 – B i g d a t a 40 Of 53 40
  • 41. N o v 2 0 1 4 – B i g d a t a 41 Of 53 41 … and associates
  • 42. N o v 2 0 1 4 – B i g d a t a 42 Of 53 42 Apache Ambari The project is aimed at making Hadoop management simpler by developing software for provisioning, managing, and monitoring Apache Hadoop clusters
  • 43. N o v 2 0 1 4 – B i g d a t a 43 Of 53 43 Apache Pig Apache Pig is a platform for analyzing large data sets that consists of a high-level language for expressing data analysis programs, coupled with infrastructure for evaluating these programs
  • 44. N o v 2 0 1 4 – B i g d a t a 44 Of 53 44 Apache Hive The Apache Hive ™ data warehouse software facilitates querying and managing large datasets residing in distributed storage. Hive provides a mechanism to project structure onto this data and query the data using a SQL-like language called HiveQL
  • 45. N o v 2 0 1 4 – B i g d a t a 45 Of 53 45 Apache Chukwa It is a data collection system for monitoring large distributed systems. Chukwa comes with a flexible and powerful toolkit for displaying, monitoring and analyzing results to make the best use of the collected data.
  • 46. N o v 2 0 1 4 – B i g d a t a 46 Of 53 46 Apache Avro A remote procedure call and data serialization framework
  • 47. N o v 2 0 1 4 – B i g d a t a 47 Of 53 47 Apache Hbase Apache Hbase offers random, realtime read/write access to your Big Data. This project's goal is the hosting of very large tables -- billions of rows X millions of columns -- atop clusters of commodity hardware
  • 48. N o v 2 0 1 4 – B i g d a t a 48 Of 53 48 Apache Mahout The Apache Mahout™ project's goal is to build a scalable machine learning library
  • 49. N o v 2 0 1 4 – B i g d a t a 49 Of 53 49 Apache Spark Apache Spark™ is a fast and general engine for large-scale data processing
  • 50. N o v 2 0 1 4 – B i g d a t a 50 Of 53 50 Apache Zookeeper ZooKeeper is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services.
  • 51. N o v 2 0 1 4 – B i g d a t a 51 Of 53 51 Big data – in the future ● 87% of enterprises believe Big Data analytics will redefine the competitive landscape of their industries within the next three years ● 89% believe that companies that do not adopt a Big Data analytics strategy in the next year risk losing market share and momentum.
  • 52. N o v 2 0 1 4 – B i g d a t a 52 Of 53 52 Big data – in the future

Editor's Notes

  1. 1880 – recensamant US BD - revativ la sistem - relativ la organizatie
  2. Faptul ca reusim sa generam si sa stocam atat de multa informatie se datoreaza si evolutiei unitatilor de stocare din dpdv al raportului capacitate stocare/pret
  3. Caracteristici...
  4. Volum...
  5. Principala atractie BigData Un volum mare de date poate crea, in urma analizei, pattern-uri comportamentale/modele mai bune. Prognoza meteo : 6 vs 300 factori (imagini, loguri satelit + senzori temperatura, presiune in aer si apa, etc) Distribution Varietate...
  6. Datele sunt generate in zeci de formate: audio, video, loguri, coord gps , documente, sms, mailuri. Nu avem control asupra tipurilor de date folosite ca input Nu putem impune o structura a datelor cu scopul de a avea control asupra analizei Viteza..
  7. F1 –> senzori-> Tb de info → procesare timp real → ajustare echipare Retailers → analiza rapida a clickstream-urilor → recomandari Surse date...
  8. Useri – generatori de date: search, clickstream-uri, Adictie retele sociale si aparitia/raspandirea smartphonurilor Web public – imdb, wikipedia, organizatii ce pun la dispozitie data-set-uri mari din diverse domenii Arhive Sisteme + Senzori – generare loguri / cei mai mari producatori de date Fiecare se caracterizeaza prin 3V Importanta..exemple...
  9. Media -Netflix → producere progr proprii Public service – Flota curierat → reducere cost mentenanta Health – retea spitale - > pattern-uri → evolutia unei boli /a starii de sanatate fctie de diversi param
  10. Problema...
  11. Generare valoare ...
  12. Scop...
  13. How big?...
  14. 25,000 machines more than 10 clusters 3 petabytes of data (compressed, unreplicated) default replication in Hadoop is 3 700+ users 10,000+ jobs/week In 2010 Facebook claimed that they had the largest Hadoop cluster in the world with 21 PB of storage.[36] On June 13, 2012 they announced the data had grown to 100 PB.[37] On November 8, 2012 they announced the data gathered in the warehouse grows by roughly half a PB per day.[38]
  15. Hadoop commons CLI MiniCluster Native libraries Security HDFS Data is distributed and replicated over multiple machines Designed for large files (where “large: means GB to TB) Block oriented Linux-style commands e.g. ls, cp, mv, rm, etc. Replication Replica placement Replica selection Rack awareness Safemode Robustness Rebalancer Hartbeats Data integrity Persistence of metadata Hadoop Common – contains libraries and utilities needed by other Hadoop modules. Hadoop Distributed File System (HDFS) – a distributed file-system that stores data on commodity machines, providing very high aggregate bandwidth across the cluster. Hadoop YARN – a resource-management platform responsible for managing compute resources in clusters and using them for scheduling of users' applications. Hadoop MapReduce – a programming model for large scale data processing
  16. Speed Run programs up to 100x faster than Hadoop MapReduce in memory, or 10x faster on disk
  17. Speed Run programs up to 100x faster than Hadoop MapReduce in memory, or 10x faster on disk
  18. Speed Run programs up to 100x faster than Hadoop MapReduce in memory, or 10x faster on disk
  19. Speed Run programs up to 100x faster than Hadoop MapReduce in memory, or 10x faster on disk