2. Contents :
What is BIG DATA ?
Why BIG DATA ?
Hadoop
Hadoop Architecture
Hadoop Distributed File System
HDFS Architecture
Map Reduce
How Map Reduce Works ?
Hadoop Ecosystem
What is Hadoop used for ?
Users of Hadoop
Advantage & Disadvantage of Hadoop
Conclusion
2
3. What Is BIG DATA ?
Big
Data
Volume
Variety
Velocity
3
4. Why BIG DATA ?
4
Mobile phone increased 70.3% to 918m in last two years.
Twitter has 328m monthly active users – 55% growth
Facebook has 765m active users.
Google+ has 495m monthly active users – grow 45%
LinkedIn has 300m users.
On every single minute 48 hours of video are posted.
5. Hadoop :
Open source distributed computing framework .
Built on Java and Scala languages.
Named by Doug Cutting on his son’s toy elephant.
5
Storage Process Hadoop
6. Hadoop Architecture :
Hadoop designed and built on two independent frame works namely :
Hadoop Distributed File System
Map Reduce
Hadoop
Map ReduceHDFS
6
7. Hadoop Distributed File System :
Based on Google File System.
Data is stored in the form of blocks .
Provide data reliability.
Provide fast processing on data.
7
9. Map Reduce :
Map Reduce
ReduceMap
9
Takes a set of data & breaks
individual elements into tuple
Takes Map’s o/p as i/p and
combine those data tuple
forming a similar set of tuple
12. What is Hadoop used for ?
• Yahoo , AmazonSearch
• Facebook , YahooLog processing
• Facebook , AOLData Warehouse
• New York Times
Video & Image
Analysis
12
14. Advantage of Hadoop :
platform independent.
Block structured file system.
We can store any thing.
Huge storage capacity.
Rapidly process large amounts of data in parallel.
Fault-tolerance.
14
15. Disadvantage of Hadoop :
Not Fit for Small Data
Setup Issue
Programming model is very restrictive
15
16. Summery
Hadoop excels at Big Data , analytics , batch
processing.
Not real-time , no random access ; not a database.
HDFS makes it all possible:
Fault tolerant file system
Fast accessing speed .
Pig , Hive are easy to use.
16