Big-Data Hadoop Tutorials - MindScripts Technologies, Pune



Business analytics focuses on developing
new insights and understanding of
business performance based on data and
statistical methods.

Business analytics

Velocity
 How fast data is being produced and how fast the data must be processed to meet
demand.
 Have a look through analytics lens!

Variability
 highly inconsistent with periodic peaks
 Is something big trending in the social media?
 Difference in Variety and Variability

Megabytes,Gigabytes…


Terabyte : To put it in some perspective, a
Terabyte could hold about 300 hours of good
quality video. A Terabyte could hold 1,000 copies
of the Encyclopedia Britannica.



Petabyte : It could hold 500 billion pages of
standard printed text.



Exabyte: It has been said that 5 Exabytes would
be equal to all of the words ever spoken by
mankind.

Human Generated Data and Machine
Generated
Data

Sheer size of Big Data
 Big Data is unstructured or semi
structured.
 No point in just storing big data, if we
can't process it.


Challenges of Big Data

Hadoop enables a computing
solution that is:

Scalable– New nodes can be added as needed, and added
without needing to change data formats, how data is loaded,
how jobs are written, or the applications on top.
 Cost effective– Hadoop brings massively parallel computing
to commodity servers.
 Flexible– Hadoop is schema-less, and can absorb any type of
data, structured or not, from any number of sources.
 Fault tolerant– When you lose a node, the system redirects
work to another location of the data and continues processing
without missing a beat.




Introduction
Hadoop: Basic Concepts
What is Hadoop?
The Hadoop Distributed File System
Hadoop Map Reduce Works
Anatomy of a Hadoop Cluster



Hadoop daemons
Master Daemons
Name node
Job Tracker
Secondary name node
Slave Daemons
Job tracker
Task tracker

Course Content



HDFS ( Hadoop Distributed File System )



Blocks and Splits
Input Splits
HDFS SplitsData Replication
Hadoop Rack Aware
Data high availability
Data Integrity
Cluster architecture and block placement
Accessing HDFS
JAVA Approach
CLI ApproachProgramming Practices
Developing MapReduce Programs in
Local Mode
Running without HDFS and Mapreduce
Pseudo-distributed Mode
Running all daemons in a single node
Fully distributed mode
Running daemons on dedicated nodes



Writing a MapReduce Program
Examining a Sample MapReduce Program
With several examples
Basic API Concepts
The Driver Code
The Mapper
The Reducer
Hadoop's Streaming API



Common MapReduce Algorithms
Sorting and Searching
Indexing
Classification/Machine Learning
Term Frequency - Inverse Document Frequency
Word Co-Occurrence
Hands-On Exercise: Creating an Inverted Index
Identity Mapper
Identity Reducer
Exploring well known problems using MapReduce applications



Debugging MapReduce Programs
Testing with MRUnit
Logging
Other Debugging Strategies.



Advanced MapReduce Programming
A Recap of the MapReduce Flow
The Secondary Sort
Customized Input Formats and Output Formats



HBase
HBase concepts
HBase architecture
Region server architecture
File storage architecture
HBase basics
Column access
Scans
HBase use cases
Install and configure HBase on a multi node cluster
Create database, Develop and run sample applications
Access data stored in HBase using clients like Java, Python and Pearl
HBase and Hive Integration
HBase admin tasks
Defining Schema and basic operation

Hadoop Ecosystem



Hive
Hive concepts
Hive architecture
Install and configure hive on cluster
Create database, access it from java client
Buckets
PartitionsJoins in hive
Inner joins
Outer Joins
Hive UDF
Hive UDAF
Hive UDTF
Develop and run sample applications in Java/Python to access hive



PIG
Pig basics
Install and configure PIG on a cluster
PIG Vs MapReduce and SQL
Pig Vs Hive
Write sample Pig Latin scripts
Modes of running PIG
Running in Grunt shell
Programming in Eclipse
Running as Java program
PIG UDFs
Pig Macros



Flume
Flume concepts
Install and configure flume on cluster
Create a sample application to capture logs from Apache using
flume



Sqoop
Getting Sqoop
A Sample Import
Database Imports
Controlling the import
Imports and consistency
Direct-mode imports
Performing an Export

Contact Us
Address
MindScripts Technologies,
2nd Floor, Siddharth Hall,
Near Ranka Jewellers,
Behind HP Petrol Pump,
Karve Rd,
Pune 411004
Call
9595957557
8805674210
9764560238
9767427924
9881371828

Address
MindScripts Technologies,
C8, 2nd Floor, Sant Tukaram Complex ,
Pradhikaran, Above Savali Hotel,
Opp Nigdi Bus Stand,
Nigdi,
Pune - 411044

www.mindscripts.com
info@mindscripts.com

Big-Data Hadoop Tutorials - MindScripts Technologies, Pune

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Andere mochten auch

Andere mochten auch (11)

Ähnlich wie Big-Data Hadoop Tutorials - MindScripts Technologies, Pune

Ähnlich wie Big-Data Hadoop Tutorials - MindScripts Technologies, Pune (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Big-Data Hadoop Tutorials - MindScripts Technologies, Pune