This document provides an overview of a course on data science and big data analytics. The course aims to build a fundamental understanding of big data problems and Hadoop as a solution. It covers topics like understanding big data and Hadoop, the Hadoop architecture and components, using Pig, Hive, Hbase, and Oozie with Hadoop, and performing data manipulation and machine learning techniques using R. The course structure includes VM installation, lectures on various Hadoop topics, and hands-on practice with Hadoop and R. Assessments include quizzes, a case study, and installing Hadoop clusters. The target audience are students who have completed courses in C, Java, data structures, operating systems, and computer networks
Design For Accessibility: Getting it right from the start
Â
Workshop1
1. Data Science and Big Data Analytics
N Chandra Shekar
Assistant Professor
Department of CSE
RGUKT RK Valley
1NChandu, CSE, RKV
Big Data by India is licensed under a Creative Commons Attribution 4.0 International License.
2. NChandu, CSE, RKV 2
Course Description â
This course builds a essential fundamental understanding of Big Data problems and
Hadoop as a solution. This course takes you through:
1.Understanding of Big Data problems with easy to understand examples.
2.History and advent of Hadoop right from when Hadoop wasnât even named Hadoop.
3.What is Hadoop Magic which makes it so unique and powerful.
4.Understanding the difference between Data science and data engineering, which is one
of the big confusions in selecting a carrier or understanding a job role.
5.And most importantly, demystifying Hadoop vendors like Cloudera, MapR and
Hortonworks by understanding about them.
3. NChandu, CSE, RKV 3
Learning Outcomesâ
* Describe the Big Data landscape including examples of real world big data problems
including the three key sources of Big Data: people, organizations, and sensors.
* Explain the Vâs of Big Data (volume, velocity, variety, veracity, valence, and value)
and why each impacts data collection, monitoring, storage, analysis and reporting.
* Get value out of Big Data by using a 5-step process to structure your analysis.
* Identify what are and what are not big data problems and be able to recast big data
problems as data science questions.
* Provide an explanation of the architectural components and programming models used
for scalable big data analysis.
* Summarize the features and value of core Hadoop stack components including the
YARN resource and job management system, the HDFS file system and the Map Reduce
programming model.
* Install and run a program using Hadoop!
4. NChandu, CSE, RKV 4
Course Structure â
1.VM Installation
2.Understanding Big Data and Hadoop
3.Hadoop Architecture and HDFS
4.Hadoop MapReduce Framework
5.Pig, Hive, Hbase, Oozie
6.Data Manipulation Using R
7.Machine Learning Techniques using R
5. NChandu, CSE, RKV 5
Delivery Format â
Will be combination of blended and Online.
6. NChandu, CSE, RKV 6
Learning Activitiesâ
1. A discussion forum will be created in the Moodle course page, where students can post
there doubts, which can be clarified by teacher / any other fellow student.
2.Students will participate in a activity where they will be evaluation the works of fellow
students, which might include evaluating quizzes etc.
7. NChandu, CSE, RKV 7
Assessementâ
1.Quiz 1 â Introduction to Big Data
2.Case Study â Cloudera Cluster
3.Installation of Single Node and Multi Node cluster
4.Installation of R Studio and R in Ubuntu.
8. NChandu, CSE, RKV 8
Expected Participation â
1.Ideally students from E3 and E4 who have completed, courses such as C, Java, DS, OS
and CN will be preferable to enroll into the course.
2.Students from E1 and E2 and Enroll into the course only for learning â R Programming
Languageâ.