More Related Content Similar to Career opportunities in open source framework (20) Career opportunities in open source framework 3. Webinar Objective
What is Open Source Framework?
Open Source Framework Example
Difference from other types of Framework?
Importance of open source framework
What is Big Data?
Deep understanding of Hadoop and Spark framework
Job Analysis on Hadoop and Spark
www.edunextgen.com 3©2018 EduNextgen
4. What is Open Source Framework?
The term “Open Source" refers to something people can modify and share because its design is publicly accessible
Code Available publicly, so source code that anyone can inspect, modify, and enhance
www.edunextgen.com 4©2018 EduNextgen
6. Difference from other types of Framework
Open Source Framework:
Source code available publicly
Redistribute solutions
Can use in any way
Eliminates single point of failure
Democratic forum for action
No vendor lock-in
No guarantee for development will continue
Intellectual property (algorithms)
Support consistency
www.edunextgen.com 6
Proprietary Software:
Predictable releases
Entity to hold responsible for bugs, errors and updates
Consistent feature development
More stable framework
More consistent training options
Easier access to support
Single company releasing patches
Costs will be Higher for start-up
Vendor will have owns software
©2018 EduNextgen
7. Importance of open source framework
Control: People have more control. As per requirement can be change and modify as well. Experiment can be possible and
what ever suitable for the requirement can be implement.
Training: Open source code is publicly accessible, students can easily study it as they learn to make better software. It help to
develop the skill.
Security: More secure and stable, because anyone can view and modify, someone might spot and correct errors or omissions
that a program's original authors might have missed.
Stability: It can be possible to develop long-term projects. Because programmers publicly distribute the source code for open
source software, users relying on that software for critical tasks can be sure their tools won't disappear or fall into disrepair if
their original creators stop working on them.
www.edunextgen.com 7©2018 EduNextgen
8. What is Big Data?
Big Data is extremely large volume of data sets that may be trends or
associations, especially relating to human behavior and interactions
Big data is a term that describes the large volume (Terabytes or Petabytes) of
data – both structured and unstructured – that inundates a business on a day-
to-day basis
Big Data is a collection of huge amount of data set which is not possible to
handle in traditional way
Big Data include capturing data, data storage, data analysis, visualization,
querying etc.
www.edunextgen.com 8©2018 EduNextgen
9. Introduction to Hadoop
www.edunextgen.com 9
Hadoop is an Open-source Data Management Framework which
support, store and process big data
Hadoop is a part of apache project. It is being used by Google,
Yahoo, Facebook, Twitter, LinkedIn and many more
It allows the distributed processing of huge data across clusters
Hadoop was developed by Doug Cutting and Mike Cafarella in
year 2006
©2018 EduNextgen
10. Necessity of Hadoop
Make strategic, confident decisions based on solid data and advanced analytics
Gain valuable business insights that help you pinpoint weaknesses and discover
new opportunities
Earn higher profits by better understanding the business, processes and the
customers
Big Data flowing in at exponential rate
Increased number of Hadoop driven jobs
www.edunextgen.com 10©2018 EduNextgen
11. Hadoop Characteristics
www.edunextgen.com 11
Hadoop Is Easily Scalable
Hadoop Brings Flexibility In Data Processing
Hadoop Is Fault Tolerant
Hadoop Is Great At Faster Data Processing
Hadoop Ecosystem Is Robust:
Hadoop Is Very Cost Effective
©2018 EduNextgen
12. Introduction to Apache Spark
www.edunextgen.com 12
Apache Spark is an open-source cluster in-memory computing framework.
Apache Spark provides an interface for programming entire clusters with implicit data
parallelism and fault tolerance.
Apache Spark provides high-level APIs in Scala, Java, R and Python.
Spark is 100x faster then Hadoop MapReduce
Spark support streaming process for large dataset
Initial Release on May, 2014
Stable Version Release on July, 2017: v2.2.0 on
Spark is written in Scala, Java, Python
Operating System Support: Microsoft Windows, macOS, Linux
©2018 EduNextgen
14. Why Spark?
Speed:
Spark is In-memory computations
It extends the MapReduce model and takes it to a whole other level
Spark is 100x times faster than Hadoop MapReduce
Generality:
Spark is able to handle wide range of workloads
Iterative algorithms
It provide interactive queries and streaming feature
Ease of use:
We have APIs as Scala, Python, Java in Spark
It contents libraries for ML, SQL, Streaming and Graph Processing
Spark runs on Hadoop clusters, Mesos Cassandra etc
www.edunextgen.com 14©2018 EduNextgen
15. Job Opportunities
www.edunextgen.com 15
After the U.S., India has the largest demand of analytics / big data / data science professionals. Amidst such
demand, people find themselves confused to select an appropriate job profile for the best future.
“A professional with working knowledge of data science and big data earns 8% more than with co-worker “
©2018 EduNextgen
16. Job Opportunities (Cont’d)
www.edunextgen.com 16
89% of hiring managers find it difficult to find talent
47% of employers are willing to pay for professional certifications, up from 33% in 2017
Positions they are looking for:
73% – Developers
60% – DevOps
53% – SysAdmins
Employers are seeking expertise in:
70% – Cloud
67% – Big Data
65% – Linux
©2018 EduNextgen
18. Next Webinar: Execute your First Hive Project
What is Big Data?
Why do we need Big Data?
What is Hive?
Basic Hive Operations & Commands:
Create database, Show databases, Use, Create table, Show table, Describe, Data
Loading in Hive Table – From Local Filesystem, Inserting Data in Hive Table,
Select*
Retails domain project execution with the hive:
Use Case #1: Out of 20000 how many customers given product rating
Use Case #2: Find how many number of product available for below brand:
Puma
Regular
First Choice
Note: Show product details for "Puma“
www.edunextgen.com 18©2018 EduNextgen
19. Hadoop Kick-Starter Course
What is this course about?
Get insights into applications of Big Data and Hadoop along with learning about performing basic operation of HDFS,
MapReduce and Hive. The course is bundled with industry grade hands on assignments and project access provided
through VM environment to practice what you learn. A program to help you understand career path in Big Data and
available learning paths to advance your career options.
Duration: 6 Hrs.
Date: 20th & 21st January, 2018
Time: 07:30 PM to 10:30 PM
Price: ₹ 499
www.edunextgen.com 19©2018 EduNextgen
Participants will get access to
Course Content (LMS Access)
10+ Assignments
20+ Quizzes
Pre-Installed Hadoop environment (Plug and Play)
1 Project with 5 Use Cases
20. Hadoop Kick-Starter Course Curriculum
Day #1:
What is Big Data?
Why need More and More Data?
Big Data Characteristics:
Volume, Velocity, Variety, Veracity
Types of Data
Applications of Big Data
Industry who generate Big Data
Introduction to Hadoop
Why Hadoop?
Hadoop Ecosystem
YARN
www.edunextgen.com 20©2018 EduNextgen
Day #2:
Hive: Introduction
What is Hive and it’s Limitation?
Hive Architecture
Hive Components
Hive Data Types:
Primary Data Types
Complex Data Types
Various Hive Commands and Operations
Joins in Hive
Project Execution
Editor's Notes 1. Hadoop Brings Flexibility In Data Processing:
One of the biggest challenges organizations have had in that past was the challenge of handling unstructured data. Hadoop manages data whether structured or unstructured, encoded or formatted, or any other type of data. Hadoop brings the value to the table where unstructured data can be useful in decision making process.
2. Hadoop Is Easily Scalable
This is a huge feature of Hadoop. It is an open source platform and runs on industry-standard hardware. That makes Hadoop extremely scalable platform where new nodes can be easily added in the system as and data volume of processing needs grow without altering anything in the existing systems or programs.
3. Hadoop Is Fault Tolerant
In Hadoop, the data is stored in HDFS where data automatically gets replicated at two other locations. So, even if one or two of the systems collapse, the file is still available on the third system at least. This brings a high level of fault tolerance.
4. Hadoop Is Great At Faster Data Processing
Hadoop is extremely good at high-volume batch processing because of its ability to do parallel processing. Hadoop can perform batch processes 10 times faster than on a single thread server or on the mainframe.
5. Hadoop Ecosystem Is Robust:
Hadoop has a very robust ecosystem that is well suited to meet the analytical needs of developers and small to large organizations. Hadoop Ecosystem comes with a suite of tools and technologies making i a very much suitable to deliver to a variety of data processing needs.
6. Hadoop Is Very Cost Effective
Hadoop generates cost benefits by bringing massively parallel computing to commodity servers, resulting in a substantial reduction in the cost per terabyte of storage, which in turn makes it reasonable to model all your data.