SlideShare ist ein Scribd-Unternehmen logo
1 von 27
Downloaden Sie, um offline zu lesen
A Discussion on Data Science as a
Career option
-By Anshik
- Under Student Mentorship Prog.
Overview
As data has multiplied, so has the ability to collect, organize, and analyze it. Data
storage is cheaper than ever, processing power is more massive than ever, and tools are
more accessible than ever to mine huge amount of available data for business
intelligence.
The McKinsey Global Institute predicted that by 2018 the U.S. could face a shortage of
1.5 million people who know how to leverage data analysis to make effective decisions.
Enter: you, taking stock of your three main career options: data analyst, data scientist,
and data engineer
Career options and difference between them
Data Analyst (1.6l - 8L)
Solve problems using
existing tools
No mathematical or
research background
required.
Manage quality of scraped
data, querying databases
and serve data as
visualization.
Data Scientist(3.5L - 18L)
Similar to data analyst in
many aspects.
Responsible for doing
undirected research and
tackle open-ended
problems and questions.
Data analyst summarizes
the past; a data scientist
strategizes for the future
Data Engineer(3L - 21L)
Does groundwork for the
former two.
Responsible for compiling
and installing database
systems, writing complex
queries, scaling to multiple
machines, and putting
disaster recovery systems
into place.
Should you put in
the time and
effort?
●
What do you think?
Data set that contains the
salaries of people who work
at an organization.
-- What questions can be
formed?
-- What Interpretations can
be made?
1.Most of the
positions sought
Masters / PhD
students (especially
in statistics).
Lorem ipsum dolor sit amet,
consectetur adipiscing elit, sed do
eiusmod tempor incididunt
The competition:
● Lorem ipsum
● Dolor sit amet
2.Learning from
MOOCs is not easy
and is
time-consuming
Lorem ipsum dolor sit amet,
consectetur adipiscing elit, sed do
eiusmod tempor incididunt
The competition:
● Lorem ipsum
● Dolor sit amet
3.Condense what
you know in
presentable
manner.
Lorem ipsum dolor sit amet,
consectetur adipiscing elit, sed do
eiusmod tempor incididunt
The competition:
● Lorem ipsum
● Dolor sit amet
On a brighter side...
Srikanth Velamakanni,CEO of CA
headquartered Fractal Analytics:
“In the next few years, the size of the
analytics market will evolve to at
least one-thirds of the global IT
market from the current one-tenths”
Big Data
Analytics Jobs
Trends
Key points
● Huge Job Opportunities & Meeting the Skill Gap
● Salary Aspects
● The Rise of Unstructured and Semistructured Data Analytics
● Used Everywhere
Total Enterprise Data Growth 2005-2015
The way we capture, store,
analyze, and distribute data
is transforming.
Deduplication,compression,
and analysis tools are
lowering costs.
Tools and
Resources
Lorem ipsum dolor sit amet,
consectetur adipiscing elit, sed do
eiusmod tempor incididunt
The competition:
● Lorem ipsum
● Dolor sit amet
Categories and Links
Books
ISLR, R for Dummies, Advanced R, Machine learning
for Hackers(Py), NLP with Python
Websites and Blogs
Analytics Vidhya, Rbloggers, Kaggle Scripts,
CrowdAnalytics, students.brown.edu, github.io
Statistics and Linear
Algebra
Inferential and Descriptive statistics by Udacity,
MSR sir’s Prob & stats Slide, Khan Acad(Lin.Alg)
Machine Learning
and AI
Andrew Ng's ML Class, John Hopkins Data Analysis,
Deepak Khemani(AI-nptel)
Data Storage and
Visualization
MongoDB(Udacity), D3.js documentation and wiki
1 3 5 7 10 12 14 20
Timeline(Weeks)[Beginers]
Learn the
Language -
R/Python
Start Doing
Hackathons/Pet Projects
Practice the
Langauge, Finish
Intro in ML
Do more advance
ML, start optimizing
your code.Start
reading git commits
Intro To ML & R
Installing Packages :-
To install a package, use the install.packages() function. Once a package is installed, it must be loaded
into your current R session before being used using library() or require(). Think of this as taking the
book off of the shelf and opening it up to read.
TIP :- Use require function for loading a package as it throws false if package is not found.
Data Types :-
R has a number of basic data types.
1. Numeric :- Also known as Double. The default type when dealing with numbers.
Examples: 1, 1.0, 42.5
2. Integer: - Examples: 1L, 2L, 42L
3. Complex : - Example: 4 + 2i
4. Logical : - Two possible values: TRUE and FALSE, you can also use T and F, but this is not
recommended.
NA is also considered logical.
5. Character :- Examples: "a", "Statistics", "1 plus 2."
R Object oriented System
S3
Lacks formal definition
Objects are created by
setting the class attribute
Attributes are accessed
using $
Methods belong to generic
function
Follows copy-on-modify
semantics
S4
Class defined using
setClass()
Objects are created using
new()
Attributes are accessed
using @
Methods belong to generic
function
Follows copy-on-modify
semantics
Reference Classes
Class defined using
setRefClass()
Objects are created using
generator functions
Attributes are accessed
using $
Methods belong to the
class
Does not follow
copy-on-modify semantics
Simple Linear Reg. using R
We will use inbuilt Cars dataset in R-base
Data gathered during the 1920s about the speed of cars
and the resulting distance it takes for the car to come to a
stop.
Objective :- How far a car travels before stopping, when
traveling at a certain speed?
What sort of function should we use for f(X)[Y=f(X) +e) for
the cars data?
- A Horizontal Line?
We see this doesn’t seem to do a very good job. Many of
the data points are very far from the orange line
representing cc . This is an example of underfitting.
- Make f(x) depend on x
- As speed increases, the distance required to come to a
stop increases. There is still some variation about this
line, but it seems to capture the overall trend.
Assumptions of Linear Regression
LINE
Linear. The relationship between Y and x is linear, of the form β0+β1x .
Independent. The errors ϵ are independent.
Normal. The errors, ϵ are normally distributed. That is the “error” around the line follows a
normal distribution.
Equal Variance. At each value of x , the variance of Y is the same, σ2 .
We have to find a line that minimize sum of all squared distances from point to line.
lm()
stop_dist_model = lm(dist ~ speed, data = cars)
The abline() function is used to add lines of the form a+bx to a plot. (Hence abline.) When we give it
stop_dist_model as an argument, it automatically extracts the regression coefficient estimates ( β̂0 and
β̂1) and uses them as the slope and intercept of the line. Here we also use lwd to modify the width of
the line, as well as col to modify the color of the line.
lm() function returns an object of class lm()
We can access the members using $ operator
> names(stop_dist_model)
> stop_dist_model$residuals
Use summary() to summarize the output for linear regression.The summary() command also returns a
list, and we can again use names() to learn what about the elements of this list.
> names(summary(stop_dist_model))
> summary(stop_dist_model)$r.squared
Use predict function to predict output for certain input values
> predict(stop_dist_model, data.frame(speed = c(8, 21, 50)))
Thank You
-Anshik
Lorem ipsum dolor sit amet,
consectetur adipiscing elit, sed do
eiusmod tempor incididunt
The competition:
● Lorem ipsum
● Dolor sit amet
8826274098 (Watsapp)

Weitere ähnliche Inhalte

Was ist angesagt?

Introduction to R
Introduction to RIntroduction to R
Introduction to RAjay Ohri
 
Abstract data types (adt) intro to data structure part 2
Abstract data types (adt)   intro to data structure part 2Abstract data types (adt)   intro to data structure part 2
Abstract data types (adt) intro to data structure part 2Self-Employed
 
Searching and Sorting Techniques in Data Structure
Searching and Sorting Techniques in Data StructureSearching and Sorting Techniques in Data Structure
Searching and Sorting Techniques in Data StructureBalwant Gorad
 
Machine learning - session 2
Machine learning - session 2Machine learning - session 2
Machine learning - session 2Luis Borbon
 
Introduction to data analysis using R
Introduction to data analysis using RIntroduction to data analysis using R
Introduction to data analysis using RVictoria López
 
R Regression Models with Zelig
R Regression Models with ZeligR Regression Models with Zelig
R Regression Models with Zeligizahn
 
Lecture 1 data structures and algorithms
Lecture 1 data structures and algorithmsLecture 1 data structures and algorithms
Lecture 1 data structures and algorithmsAakash deep Singhal
 
physical and logical data independence
physical and logical data independencephysical and logical data independence
physical and logical data independenceapoorva_upadhyay
 
Ii pu cs practical viva voce questions
Ii pu cs  practical viva voce questionsIi pu cs  practical viva voce questions
Ii pu cs practical viva voce questionsProf. Dr. K. Adisesha
 
Data Patterns - A Native Open Source Data Profiling Tool for HPCC Systems
Data Patterns - A Native Open Source Data Profiling Tool for HPCC SystemsData Patterns - A Native Open Source Data Profiling Tool for HPCC Systems
Data Patterns - A Native Open Source Data Profiling Tool for HPCC SystemsHPCC Systems
 
Data Science - Part II - Working with R & R studio
Data Science - Part II -  Working with R & R studioData Science - Part II -  Working with R & R studio
Data Science - Part II - Working with R & R studioDerek Kane
 
Data structure
Data structureData structure
Data structureMohd Arif
 
Lecture 01 Intro to DSA
Lecture 01 Intro to DSALecture 01 Intro to DSA
Lecture 01 Intro to DSANurjahan Nipa
 

Was ist angesagt? (20)

DBMS - ER Model
DBMS - ER ModelDBMS - ER Model
DBMS - ER Model
 
Introduction to R
Introduction to RIntroduction to R
Introduction to R
 
Data structure
Data structureData structure
Data structure
 
Abstract data types (adt) intro to data structure part 2
Abstract data types (adt)   intro to data structure part 2Abstract data types (adt)   intro to data structure part 2
Abstract data types (adt) intro to data structure part 2
 
Searching and Sorting Techniques in Data Structure
Searching and Sorting Techniques in Data StructureSearching and Sorting Techniques in Data Structure
Searching and Sorting Techniques in Data Structure
 
Machine learning - session 2
Machine learning - session 2Machine learning - session 2
Machine learning - session 2
 
Introduction to data analysis using R
Introduction to data analysis using RIntroduction to data analysis using R
Introduction to data analysis using R
 
R Regression Models with Zelig
R Regression Models with ZeligR Regression Models with Zelig
R Regression Models with Zelig
 
Lecture 1 data structures and algorithms
Lecture 1 data structures and algorithmsLecture 1 data structures and algorithms
Lecture 1 data structures and algorithms
 
Property Alignment on Linked Open Data
Property Alignment on Linked Open DataProperty Alignment on Linked Open Data
Property Alignment on Linked Open Data
 
An Intoduction to R
An Intoduction to RAn Intoduction to R
An Intoduction to R
 
RDBMS_Unit 01
RDBMS_Unit 01RDBMS_Unit 01
RDBMS_Unit 01
 
physical and logical data independence
physical and logical data independencephysical and logical data independence
physical and logical data independence
 
Ii pu cs practical viva voce questions
Ii pu cs  practical viva voce questionsIi pu cs  practical viva voce questions
Ii pu cs practical viva voce questions
 
Data Patterns - A Native Open Source Data Profiling Tool for HPCC Systems
Data Patterns - A Native Open Source Data Profiling Tool for HPCC SystemsData Patterns - A Native Open Source Data Profiling Tool for HPCC Systems
Data Patterns - A Native Open Source Data Profiling Tool for HPCC Systems
 
Unit 3
Unit 3Unit 3
Unit 3
 
Data Science - Part II - Working with R & R studio
Data Science - Part II -  Working with R & R studioData Science - Part II -  Working with R & R studio
Data Science - Part II - Working with R & R studio
 
Data structure
Data structureData structure
Data structure
 
Lect07
Lect07Lect07
Lect07
 
Lecture 01 Intro to DSA
Lecture 01 Intro to DSALecture 01 Intro to DSA
Lecture 01 Intro to DSA
 

Ähnlich wie Data Science as a Career and Intro to R

A tour of the top 10 algorithms for machine learning newbies
A tour of the top 10 algorithms for machine learning newbiesA tour of the top 10 algorithms for machine learning newbies
A tour of the top 10 algorithms for machine learning newbiesVimal Gupta
 
Machine learning introduction
Machine learning introductionMachine learning introduction
Machine learning introductionAnas Jamil
 
Data Science Job ready #DataScienceInterview Question and Answers 2022 | #Dat...
Data Science Job ready #DataScienceInterview Question and Answers 2022 | #Dat...Data Science Job ready #DataScienceInterview Question and Answers 2022 | #Dat...
Data Science Job ready #DataScienceInterview Question and Answers 2022 | #Dat...Rohit Dubey
 
KIT-601 Lecture Notes-UNIT-2.pdf
KIT-601 Lecture Notes-UNIT-2.pdfKIT-601 Lecture Notes-UNIT-2.pdf
KIT-601 Lecture Notes-UNIT-2.pdfDr. Radhey Shyam
 
Introduction to Data Science.pptx
Introduction to Data Science.pptxIntroduction to Data Science.pptx
Introduction to Data Science.pptxVrishit Saraswat
 
Introduction to Data Science.pptx
Introduction to Data Science.pptxIntroduction to Data Science.pptx
Introduction to Data Science.pptxDr.Shweta
 
EE-232-LEC-01 Data_structures.pptx
EE-232-LEC-01 Data_structures.pptxEE-232-LEC-01 Data_structures.pptx
EE-232-LEC-01 Data_structures.pptxiamultapromax
 
Introduction to Datamining Concept and Techniques
Introduction to Datamining Concept and TechniquesIntroduction to Datamining Concept and Techniques
Introduction to Datamining Concept and TechniquesSơn Còm Nhom
 
Data analytcis-first-steps
Data analytcis-first-stepsData analytcis-first-steps
Data analytcis-first-stepsShesha R
 
Azure Databricks for Data Scientists
Azure Databricks for Data ScientistsAzure Databricks for Data Scientists
Azure Databricks for Data ScientistsRichard Garris
 
The Incredible Disappearing Data Scientist
The Incredible Disappearing Data ScientistThe Incredible Disappearing Data Scientist
The Incredible Disappearing Data ScientistRebecca Bilbro
 
Artificial intyelligence and machine learning introduction.pptx
Artificial intyelligence and machine learning introduction.pptxArtificial intyelligence and machine learning introduction.pptx
Artificial intyelligence and machine learning introduction.pptxChandrakalaV15
 
Exploratory Data Analysis
Exploratory Data AnalysisExploratory Data Analysis
Exploratory Data AnalysisKaty Allen
 
Linear Algebra – A Powerful Tool for Data Science
Linear Algebra – A Powerful Tool for Data ScienceLinear Algebra – A Powerful Tool for Data Science
Linear Algebra – A Powerful Tool for Data SciencePremier Publishers
 

Ähnlich wie Data Science as a Career and Intro to R (20)

A tour of the top 10 algorithms for machine learning newbies
A tour of the top 10 algorithms for machine learning newbiesA tour of the top 10 algorithms for machine learning newbies
A tour of the top 10 algorithms for machine learning newbies
 
Machine learning introduction
Machine learning introductionMachine learning introduction
Machine learning introduction
 
Data Science Job ready #DataScienceInterview Question and Answers 2022 | #Dat...
Data Science Job ready #DataScienceInterview Question and Answers 2022 | #Dat...Data Science Job ready #DataScienceInterview Question and Answers 2022 | #Dat...
Data Science Job ready #DataScienceInterview Question and Answers 2022 | #Dat...
 
fINAL ML PPT.pptx
fINAL ML PPT.pptxfINAL ML PPT.pptx
fINAL ML PPT.pptx
 
KIT-601 Lecture Notes-UNIT-2.pdf
KIT-601 Lecture Notes-UNIT-2.pdfKIT-601 Lecture Notes-UNIT-2.pdf
KIT-601 Lecture Notes-UNIT-2.pdf
 
Introduction to Data Science.pptx
Introduction to Data Science.pptxIntroduction to Data Science.pptx
Introduction to Data Science.pptx
 
Introduction to Data Science.pptx
Introduction to Data Science.pptxIntroduction to Data Science.pptx
Introduction to Data Science.pptx
 
EE-232-LEC-01 Data_structures.pptx
EE-232-LEC-01 Data_structures.pptxEE-232-LEC-01 Data_structures.pptx
EE-232-LEC-01 Data_structures.pptx
 
Introduction to Datamining Concept and Techniques
Introduction to Datamining Concept and TechniquesIntroduction to Datamining Concept and Techniques
Introduction to Datamining Concept and Techniques
 
Data analytcis-first-steps
Data analytcis-first-stepsData analytcis-first-steps
Data analytcis-first-steps
 
Predictive modeling
Predictive modelingPredictive modeling
Predictive modeling
 
DataScience_RoadMap_2023.pdf
DataScience_RoadMap_2023.pdfDataScience_RoadMap_2023.pdf
DataScience_RoadMap_2023.pdf
 
Machine Learning.pptx
Machine Learning.pptxMachine Learning.pptx
Machine Learning.pptx
 
Azure Databricks for Data Scientists
Azure Databricks for Data ScientistsAzure Databricks for Data Scientists
Azure Databricks for Data Scientists
 
The Incredible Disappearing Data Scientist
The Incredible Disappearing Data ScientistThe Incredible Disappearing Data Scientist
The Incredible Disappearing Data Scientist
 
Big data presentation
Big data presentationBig data presentation
Big data presentation
 
Algorithms and Data Structures~hmftj
Algorithms and Data Structures~hmftjAlgorithms and Data Structures~hmftj
Algorithms and Data Structures~hmftj
 
Artificial intyelligence and machine learning introduction.pptx
Artificial intyelligence and machine learning introduction.pptxArtificial intyelligence and machine learning introduction.pptx
Artificial intyelligence and machine learning introduction.pptx
 
Exploratory Data Analysis
Exploratory Data AnalysisExploratory Data Analysis
Exploratory Data Analysis
 
Linear Algebra – A Powerful Tool for Data Science
Linear Algebra – A Powerful Tool for Data ScienceLinear Algebra – A Powerful Tool for Data Science
Linear Algebra – A Powerful Tool for Data Science
 

Kürzlich hochgeladen

457503602-5-Gas-Well-Testing-and-Analysis-pptx.pptx
457503602-5-Gas-Well-Testing-and-Analysis-pptx.pptx457503602-5-Gas-Well-Testing-and-Analysis-pptx.pptx
457503602-5-Gas-Well-Testing-and-Analysis-pptx.pptxrouholahahmadi9876
 
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptxA CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptxmaisarahman1
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfJiananWang21
 
DC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationDC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationBhangaleSonal
 
Jaipur ❤CALL GIRL 0000000000❤CALL GIRLS IN Jaipur ESCORT SERVICE❤CALL GIRL IN...
Jaipur ❤CALL GIRL 0000000000❤CALL GIRLS IN Jaipur ESCORT SERVICE❤CALL GIRL IN...Jaipur ❤CALL GIRL 0000000000❤CALL GIRLS IN Jaipur ESCORT SERVICE❤CALL GIRL IN...
Jaipur ❤CALL GIRL 0000000000❤CALL GIRLS IN Jaipur ESCORT SERVICE❤CALL GIRL IN...jabtakhaidam7
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueBhangaleSonal
 
Unleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapUnleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapRishantSharmaFr
 
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptxHOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptxSCMS School of Architecture
 
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKARHAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKARKOUSTAV SARKAR
 
A Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityA Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityMorshed Ahmed Rahath
 
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...drmkjayanthikannan
 
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
COST-EFFETIVE  and Energy Efficient BUILDINGS ptxCOST-EFFETIVE  and Energy Efficient BUILDINGS ptx
COST-EFFETIVE and Energy Efficient BUILDINGS ptxJIT KUMAR GUPTA
 
Work-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxWork-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxJuliansyahHarahap1
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTbhaskargani46
 
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments""Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"mphochane1998
 
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptxS1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptxSCMS School of Architecture
 
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...Call Girls Mumbai
 
Hospital management system project report.pdf
Hospital management system project report.pdfHospital management system project report.pdf
Hospital management system project report.pdfKamal Acharya
 

Kürzlich hochgeladen (20)

Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
 
457503602-5-Gas-Well-Testing-and-Analysis-pptx.pptx
457503602-5-Gas-Well-Testing-and-Analysis-pptx.pptx457503602-5-Gas-Well-Testing-and-Analysis-pptx.pptx
457503602-5-Gas-Well-Testing-and-Analysis-pptx.pptx
 
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptxA CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdf
 
DC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationDC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equation
 
Jaipur ❤CALL GIRL 0000000000❤CALL GIRLS IN Jaipur ESCORT SERVICE❤CALL GIRL IN...
Jaipur ❤CALL GIRL 0000000000❤CALL GIRLS IN Jaipur ESCORT SERVICE❤CALL GIRL IN...Jaipur ❤CALL GIRL 0000000000❤CALL GIRLS IN Jaipur ESCORT SERVICE❤CALL GIRL IN...
Jaipur ❤CALL GIRL 0000000000❤CALL GIRLS IN Jaipur ESCORT SERVICE❤CALL GIRL IN...
 
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torque
 
Unleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapUnleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leap
 
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptxHOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
 
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKARHAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
 
A Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityA Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna Municipality
 
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
 
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
COST-EFFETIVE  and Energy Efficient BUILDINGS ptxCOST-EFFETIVE  and Energy Efficient BUILDINGS ptx
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
 
Work-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxWork-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptx
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
 
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments""Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
 
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptxS1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
 
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
 
Hospital management system project report.pdf
Hospital management system project report.pdfHospital management system project report.pdf
Hospital management system project report.pdf
 

Data Science as a Career and Intro to R

  • 1. A Discussion on Data Science as a Career option -By Anshik - Under Student Mentorship Prog.
  • 2.
  • 3. Overview As data has multiplied, so has the ability to collect, organize, and analyze it. Data storage is cheaper than ever, processing power is more massive than ever, and tools are more accessible than ever to mine huge amount of available data for business intelligence. The McKinsey Global Institute predicted that by 2018 the U.S. could face a shortage of 1.5 million people who know how to leverage data analysis to make effective decisions. Enter: you, taking stock of your three main career options: data analyst, data scientist, and data engineer
  • 4. Career options and difference between them Data Analyst (1.6l - 8L) Solve problems using existing tools No mathematical or research background required. Manage quality of scraped data, querying databases and serve data as visualization. Data Scientist(3.5L - 18L) Similar to data analyst in many aspects. Responsible for doing undirected research and tackle open-ended problems and questions. Data analyst summarizes the past; a data scientist strategizes for the future Data Engineer(3L - 21L) Does groundwork for the former two. Responsible for compiling and installing database systems, writing complex queries, scaling to multiple machines, and putting disaster recovery systems into place.
  • 5. Should you put in the time and effort? ●
  • 6. What do you think? Data set that contains the salaries of people who work at an organization. -- What questions can be formed? -- What Interpretations can be made?
  • 7. 1.Most of the positions sought Masters / PhD students (especially in statistics). Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt The competition: ● Lorem ipsum ● Dolor sit amet
  • 8. 2.Learning from MOOCs is not easy and is time-consuming Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt The competition: ● Lorem ipsum ● Dolor sit amet
  • 9. 3.Condense what you know in presentable manner. Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt The competition: ● Lorem ipsum ● Dolor sit amet
  • 10. On a brighter side...
  • 11. Srikanth Velamakanni,CEO of CA headquartered Fractal Analytics: “In the next few years, the size of the analytics market will evolve to at least one-thirds of the global IT market from the current one-tenths”
  • 13. Key points ● Huge Job Opportunities & Meeting the Skill Gap ● Salary Aspects ● The Rise of Unstructured and Semistructured Data Analytics ● Used Everywhere
  • 14. Total Enterprise Data Growth 2005-2015 The way we capture, store, analyze, and distribute data is transforming. Deduplication,compression, and analysis tools are lowering costs.
  • 15. Tools and Resources Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt The competition: ● Lorem ipsum ● Dolor sit amet
  • 16. Categories and Links Books ISLR, R for Dummies, Advanced R, Machine learning for Hackers(Py), NLP with Python Websites and Blogs Analytics Vidhya, Rbloggers, Kaggle Scripts, CrowdAnalytics, students.brown.edu, github.io Statistics and Linear Algebra Inferential and Descriptive statistics by Udacity, MSR sir’s Prob & stats Slide, Khan Acad(Lin.Alg) Machine Learning and AI Andrew Ng's ML Class, John Hopkins Data Analysis, Deepak Khemani(AI-nptel) Data Storage and Visualization MongoDB(Udacity), D3.js documentation and wiki
  • 17. 1 3 5 7 10 12 14 20 Timeline(Weeks)[Beginers] Learn the Language - R/Python Start Doing Hackathons/Pet Projects Practice the Langauge, Finish Intro in ML Do more advance ML, start optimizing your code.Start reading git commits
  • 18. Intro To ML & R Installing Packages :- To install a package, use the install.packages() function. Once a package is installed, it must be loaded into your current R session before being used using library() or require(). Think of this as taking the book off of the shelf and opening it up to read. TIP :- Use require function for loading a package as it throws false if package is not found. Data Types :- R has a number of basic data types. 1. Numeric :- Also known as Double. The default type when dealing with numbers. Examples: 1, 1.0, 42.5 2. Integer: - Examples: 1L, 2L, 42L 3. Complex : - Example: 4 + 2i 4. Logical : - Two possible values: TRUE and FALSE, you can also use T and F, but this is not recommended. NA is also considered logical. 5. Character :- Examples: "a", "Statistics", "1 plus 2."
  • 19.
  • 20. R Object oriented System S3 Lacks formal definition Objects are created by setting the class attribute Attributes are accessed using $ Methods belong to generic function Follows copy-on-modify semantics S4 Class defined using setClass() Objects are created using new() Attributes are accessed using @ Methods belong to generic function Follows copy-on-modify semantics Reference Classes Class defined using setRefClass() Objects are created using generator functions Attributes are accessed using $ Methods belong to the class Does not follow copy-on-modify semantics
  • 22. We will use inbuilt Cars dataset in R-base Data gathered during the 1920s about the speed of cars and the resulting distance it takes for the car to come to a stop. Objective :- How far a car travels before stopping, when traveling at a certain speed?
  • 23. What sort of function should we use for f(X)[Y=f(X) +e) for the cars data? - A Horizontal Line? We see this doesn’t seem to do a very good job. Many of the data points are very far from the orange line representing cc . This is an example of underfitting. - Make f(x) depend on x - As speed increases, the distance required to come to a stop increases. There is still some variation about this line, but it seems to capture the overall trend.
  • 24.
  • 25. Assumptions of Linear Regression LINE Linear. The relationship between Y and x is linear, of the form β0+β1x . Independent. The errors ϵ are independent. Normal. The errors, ϵ are normally distributed. That is the “error” around the line follows a normal distribution. Equal Variance. At each value of x , the variance of Y is the same, σ2 . We have to find a line that minimize sum of all squared distances from point to line.
  • 26. lm() stop_dist_model = lm(dist ~ speed, data = cars) The abline() function is used to add lines of the form a+bx to a plot. (Hence abline.) When we give it stop_dist_model as an argument, it automatically extracts the regression coefficient estimates ( β̂0 and β̂1) and uses them as the slope and intercept of the line. Here we also use lwd to modify the width of the line, as well as col to modify the color of the line. lm() function returns an object of class lm() We can access the members using $ operator > names(stop_dist_model) > stop_dist_model$residuals Use summary() to summarize the output for linear regression.The summary() command also returns a list, and we can again use names() to learn what about the elements of this list. > names(summary(stop_dist_model)) > summary(stop_dist_model)$r.squared Use predict function to predict output for certain input values > predict(stop_dist_model, data.frame(speed = c(8, 21, 50)))
  • 27. Thank You -Anshik Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt The competition: ● Lorem ipsum ● Dolor sit amet 8826274098 (Watsapp)