Career opportunities in open source framework

Hadoop/Spark Webinar
Career Opportunities in Open Source Framework

Know your Trainer
www.edunextgen.com 2©2018 EduNextgen

Webinar Objective
What is Open Source Framework?
Open Source Framework Example
Difference from other types of Framework?
Importance of open source framework
What is Big Data?
Deep understanding of Hadoop and Spark framework
Job Analysis on Hadoop and Spark

What is Open Source Framework?
The term “Open Source" refers to something people can modify and share because its design is publicly accessible
Code Available publicly, so source code that anyone can inspect, modify, and enhance

Open Source Framework Example

Difference from other types of Framework
Open Source Framework:
Source code available publicly
Redistribute solutions
Can use in any way
Eliminates single point of failure
Democratic forum for action
No vendor lock-in
No guarantee for development will continue
Intellectual property (algorithms)
Support consistency
www.edunextgen.com 6
Proprietary Software:
Predictable releases
Entity to hold responsible for bugs, errors and updates
Consistent feature development
More stable framework
More consistent training options
Easier access to support
Single company releasing patches
Costs will be Higher for start-up
Vendor will have owns software
©2018 EduNextgen

Importance of open source framework
Control: People have more control. As per requirement can be change and modify as well. Experiment can be possible and
what ever suitable for the requirement can be implement.
Training: Open source code is publicly accessible, students can easily study it as they learn to make better software. It help to
develop the skill.
Security: More secure and stable, because anyone can view and modify, someone might spot and correct errors or omissions
that a program's original authors might have missed.
Stability: It can be possible to develop long-term projects. Because programmers publicly distribute the source code for open
source software, users relying on that software for critical tasks can be sure their tools won't disappear or fall into disrepair if
their original creators stop working on them.

What is Big Data?
Big Data is extremely large volume of data sets that may be trends or
associations, especially relating to human behavior and interactions
Big data is a term that describes the large volume (Terabytes or Petabytes) of
data – both structured and unstructured – that inundates a business on a day-
to-day basis
Big Data is a collection of huge amount of data set which is not possible to
handle in traditional way
Big Data include capturing data, data storage, data analysis, visualization,
querying etc.

Introduction to Hadoop
Hadoop is an Open-source Data Management Framework which
support, store and process big data
Hadoop is a part of apache project. It is being used by Google,
Yahoo, Facebook, Twitter, LinkedIn and many more
It allows the distributed processing of huge data across clusters
Hadoop was developed by Doug Cutting and Mike Cafarella in
year 2006
©2018 EduNextgen

Necessity of Hadoop
Make strategic, confident decisions based on solid data and advanced analytics
Gain valuable business insights that help you pinpoint weaknesses and discover
new opportunities
Earn higher profits by better understanding the business, processes and the
customers
Big Data flowing in at exponential rate
Increased number of Hadoop driven jobs

Hadoop Characteristics
Hadoop Is Easily Scalable
Hadoop Brings Flexibility In Data Processing
Hadoop Is Fault Tolerant
Hadoop Is Great At Faster Data Processing
Hadoop Ecosystem Is Robust:
Hadoop Is Very Cost Effective
©2018 EduNextgen

Introduction to Apache Spark
Apache Spark is an open-source cluster in-memory computing framework.
Apache Spark provides an interface for programming entire clusters with implicit data
parallelism and fault tolerance.
Apache Spark provides high-level APIs in Scala, Java, R and Python.
Spark is 100x faster then Hadoop MapReduce
Spark support streaming process for large dataset
Initial Release on May, 2014
Stable Version Release on July, 2017: v2.2.0 on
Spark is written in Scala, Java, Python
Operating System Support: Microsoft Windows, macOS, Linux
©2018 EduNextgen

Introduction to Apache Spark (Cont’d)

Why Spark?
Speed:
Spark is In-memory computations
It extends the MapReduce model and takes it to a whole other level
Spark is 100x times faster than Hadoop MapReduce
Generality:
Spark is able to handle wide range of workloads
Iterative algorithms
It provide interactive queries and streaming feature
Ease of use:
We have APIs as Scala, Python, Java in Spark
It contents libraries for ML, SQL, Streaming and Graph Processing
Spark runs on Hadoop clusters, Mesos Cassandra etc

Job Opportunities
After the U.S., India has the largest demand of analytics / big data / data science professionals. Amidst such
demand, people find themselves confused to select an appropriate job profile for the best future.
“A professional with working knowledge of data science and big data earns 8% more than with co-worker “
©2018 EduNextgen

Job Opportunities (Cont’d)
89% of hiring managers find it difficult to find talent
47% of employers are willing to pay for professional certifications, up from 33% in 2017
Positions they are looking for:
73% – Developers
60% – DevOps
53% – SysAdmins
Employers are seeking expertise in:
70% – Cloud
67% – Big Data
65% – Linux
©2018 EduNextgen

Job Opportunities (Cont’d)

Next Webinar: Execute your First Hive Project
What is Big Data?
Why do we need Big Data?
What is Hive?
Basic Hive Operations & Commands:
Create database, Show databases, Use, Create table, Show table, Describe, Data
Loading in Hive Table – From Local Filesystem, Inserting Data in Hive Table,
Select*
Retails domain project execution with the hive:
Use Case #1: Out of 20000 how many customers given product rating
Use Case #2: Find how many number of product available for below brand:
Puma
Regular
First Choice
Note: Show product details for "Puma“

Hadoop Kick-Starter Course
What is this course about?
Get insights into applications of Big Data and Hadoop along with learning about performing basic operation of HDFS,
MapReduce and Hive. The course is bundled with industry grade hands on assignments and project access provided
through VM environment to practice what you learn. A program to help you understand career path in Big Data and
available learning paths to advance your career options.
Duration: 6 Hrs.
Date: 20th & 21st January, 2018
Time: 07:30 PM to 10:30 PM
Price: ₹ 499
Participants will get access to
Course Content (LMS Access)
10+ Assignments
20+ Quizzes
Pre-Installed Hadoop environment (Plug and Play)
1 Project with 5 Use Cases

Hadoop Kick-Starter Course Curriculum
Day #1:
What is Big Data?
Why need More and More Data?
Big Data Characteristics:
Volume, Velocity, Variety, Veracity
Types of Data
Applications of Big Data
Industry who generate Big Data
Introduction to Hadoop
Why Hadoop?
Hadoop Ecosystem
YARN
Day #2:
Hive: Introduction
What is Hive and it’s Limitation?
Hive Architecture
Hive Components
Hive Data Types:
Primary Data Types
Complex Data Types
Various Hive Commands and Operations
Joins in Hive
Project Execution

Q&A

Career opportunities in open source framework

Career opportunities in open source framework

Recommended

Recommended

More Related Content

What's hot

What's hot (15)

Similar to Career opportunities in open source framework

Similar to Career opportunities in open source framework (20)

Recently uploaded

Recently uploaded (20)

Career opportunities in open source framework

Editor's Notes