2. Agenda
The Concepts
Type of databases
The Big Data and Hadoop
Hadoop Approach
HDFS Architecture
RDBMS vs Hadoop
About the Trainer
DO NOT COPY 2
3. The Concepts
Data science involves principles, processes, and techniques for understanding phenomena via the
(automated) analysis of data.
Data-driven decision-making (DDD) refers to the practice of basing decisions on the analysis of
data, rather than purely on intuition.
Data analytics (DA) is the science of examining raw data with the purpose of drawing
conclusions about that information. Data analytics is used in many industries to allow companies
and organization to make better business decisions and in the sciences to verify or disprove
existing models or theories.
Data warehouse is constructed by integrating data from multiple heterogeneous sources. It
supports analytical reporting, structured and/or ad hoc queries and decision making.
Big data means really a big data, it is a collection of large datasets that cannot be processed using
traditional computing techniques. Big data is not merely a data, rather it has become a complete
subject, which involves various tools, techniques and frameworks.
DO NOT COPY 3
5. The Big Data and Hadoop
DO NOT COPY 5
Big data is a broad term for data sets so large or
complex that traditional data processing applications
are inadequate. Challenges include analysis, capture,
search, sharing, storage, transfer, visualization, and
information privacy.
Hadoop is an Apache open source framework written in
java that allows distributed processing of large datasets
across clusters of computers using simple programming
models. A Hadoop frame-worked application works in an
environment that provides distributed storage and
computation across clusters of computers. Hadoop is
designed to scale up from single server to thousands of
machines, each offering local computation and storage.
6. Hadoop Approach
DO NOT COPY 6
Hadoop runs applications using the
MapReduce algorithm, where the data is
processed in parallel on different CPU
nodes. In short, Hadoop framework is
capable enough to develop applications
capable of running on clusters of computers
and they could perform complete statistical
analysis for a huge amounts of data.
7. HDFS Architecture
DO NOT COPY 7
The system having the namenode acts as the master server and it does the
following tasks:
Manages the file system namespace.
Regulates client’s access to files.
It also executes file system operations such as renaming, closing, and
opening files and directories.
Datanodes manage the data storage of their system.
Datanodes perform read-write operations on the file systems, as per
client request.
They also perform operations such as block creation, deletion, and
replication according to the instructions of the namenode.
Generally the user data is stored in the files of HDFS. The file in a file system will be divided into one or more
segments and/or stored in individual data nodes. These file segments are called as blocks. In other words, the
minimum amount of data that HDFS can read or write is called a Block. The default block size is 64MB, but it can
be increased as per the need to change in HDFS configuration.
9. About the Trainer
9DO NOT COPY
Mr Akash Pramanik
Oracle Database Administrator by profession and a
freelance Trainer/Teacher by passion. With
exceptional presentation and training program
design abilities I have provided training to
employees, students, interns, fresher trainees using
classroom, conferences and online facilities.
I am specialized in Oracle Database (12c, 11g, 10g),
Oracle Apps (11i, R12), Oracle Business
Intelligence, Oracle FMW products, Oracle Data
Integrator, Oracle Golden Gate, Exadata, PL/SQL,
MongoDB, Hadoop, Teradata, Linux, Unix, etc.
I am also proficient in training in non-technical
subjects like Software Development Life Cycle,
Information Life Cycle Management, Project
Planning and Management, Communication and
Personality Development, US accent Training, etc.
10. Thank You
Follow me at –
http://akashpramanik.blogspot.in
https://in.linkedin.com/in/akashpramanik