This is a power point presentation on Hadoop and Big Data. This covers the essential knowledge one should have when stepping into the world of Big Data.
This course is available on hadoop-skills.com for free!
This course builds a basic fundamental understanding of Big Data problems and Hadoop as a solution. This course takes you through:
• This course builds Understanding of Big Data problems with easy to understand examples and illustrations.
• History and advent of Hadoop right from when Hadoop wasn’t even named Hadoop and was called Nutch
• What is Hadoop Magic which makes it so unique and powerful.
• Understanding the difference between Data science and data engineering, which is one of the big confusions in selecting a carrier or understanding a job role.
• And most importantly, demystifying Hadoop vendors like Cloudera, MapR and Hortonworks by understanding about them.
This course is available for free on hadoop-skills.com
3. Facebook,Twitter, Google generating petabytes of data everyday.
Hadron Collider project discarding large amount of data as they won’t be able to
analyse. Hoping that they haven’t thrown anything valuable.
Interesting facts but ….Why is Big Data important?
Lets understand via an example
12. International DataCorporation’s (IDC) 6th annual study:
From 2005 to 2020, the digital universe will grow by a factor of 300, from 130
exabytes to 40,000 exabytes, or 40 trillion gigabytes
More than 5,200 gigabytes for every man, woman, and child in 2020.
From now until 2020, the digital universe will about double every two years.
33% of the digital data might be valuable if analysed, compared with 25%
today.
From Gartner:
4.4 Million IT Jobs Globally to Support Big Data By 2015.
13.
14. 2003-041996-2000 2005-06 2010 2013
Google File System
And MapReduce Papers
YARN/MapReduce 2/
Next Generation Hadoop
Hadoop spawns off
Nutch
Big Data problem faced by
All Search engines
and Mike
Dreadnaught
Doug Joins
Cloudera
0.xx Releases of
hadoop
23. Complex Algorithm
on a small dataset
SimpleAlgorithm
on a large dataset
1. Complex Algorithms needs to be
correctly sensitive to week
correlations.
2. Complex Algorithms are thus
difficult to code and design.
24. Data Engineer Data Scientist
Role
Skills
To solve business problems
using data.
To engineer software solutions.
More of programing and
technical skills and ability to
architect technical solutions.
Strong of Mathematical Skills
and understanding of statistical
Models.
25. -> SkeletonVersion
->All the ecosystems need
to be additionally installed.
-> Important ecosystem
members included.
-> Few Proprietary tools
like Enterprise Manager.
-> Proprietary Hadoop code
written in C.
-> Integrated with Hadoop
ecosystem members.
-> Based out of Apache
hadoop.
-> Supports .NET framework
-> Launches Hadoop
Distribution: Pivotal HD