SlideShare ist ein Scribd-Unternehmen logo
1 von 46
Downloaden Sie, um offline zu lesen
Data Science
APPLICATION AND OPPORTUNITY
Prepared By: Tarun Sukhani
WHAT IS DATA SCIENCE &
BIG DATA?
Data Science is an interdisciplinary field that
combines statistics, computer science, and
operations research. It has numerous applications
such as in Fintech, Genomics, and even the Social
Sciences, just to name a few.
Big Data is data science applied to large
data sets, usually in the terabyte range and
above. It has its roots in Web 2.0 which
emphasized user-generated content, thus
resulting in greater variety, volume, and
velocity of data.
DATA SCIENCE CORE
COMPONENTS
BIG DATA – THE 4 V’S
BIG DATA – UNPRECEDENTED GROWTH
WHAT IS A DATA SCIENTIST?
DATA SCIENCE VENN DIAGRAM
Hacking Skills
Having a proper mathematical background and
domain expertise may not be sufficient to succeed
as a data scientist. The ability to combine together
Different tools and visualizations is key to becoming
an effective data scientist.
Math & Statistics
Computer Science, Math, Statistics, and
Linear Algebra provide a solid foundation from which
a data scientist can draw the necessary knowledge to
apply analysis to data sets.
SME & Job Experience
There is no substitute for solid work experience as
a business analyst, programmer, and/or statistician
for the domain in which you are applying your skills
and knowledge. The absence of such experience can
lead to biased statistical models or irrelevant
conclusions.
WHAT DOES A GOOD DATA
SCIENTIST LOOK LIKE?
Inquisitive – skeptical and curious
Knowledgeable – knows machine
learning, statistics, and probability
Scientific Method – Creates
hypotheses, tests them, and updates
understanding
Coding – is good at coding, hacking,
and general programming
Product Oriented – knows how to
build data products and visualizations to
make data understandable to mere
mortals
Domain Knowledge –
understands the business and how to tell
the relevant
story from business data. Able to find
answers to known unknowns.
T-SHAPED SKILLSET
Broad-range Generalist
DeepExpertise
Machine Learning, Statistics, Domain Knowledge
DATA SCIENTIST ROLES
DATA SCIENTIST ROADMAP
DEMAND & OPPORTUNITY
Data Science has been dubbed by the Harvard Business Review (Thomas H. Davenport
and D.J. Patil, October 2012) as…
“The Sexiest Job of the 21st Century”
https://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century
And by the New York Times (April 11, 2013) as a…
“hot new field [that] promises to revolutionize
industries from business to government,
healthcare to academia”
Data Science, however, is NOT NEW! It’s basically just data mining rebranded.
DEMAND & OPPORTUNITY
Data Scientist was identified by Glassdoor as the top job for Work-Life Balance in 2015
(out of 25), with the highest salary…(in USA)
1. Data Scientist
• Work-Life Balance Rating: 4.2 (out of 5)
• Salary: $114,808 (highest salary)
• Number of Job Openings: 1,315 (highest in the top 9)
https://www.glassdoor.com/blog/25-jobs-worklife-balance-2015/
According to McKinsey, there will be a shortage of talent needed to take advantage of data
science and big data. By 2018, The USA alone could face a shortage of 140-190k skilled data
scientists and 1.5 million managers and analysts with the know-how to use the analysis of big
data to make effective decisions.
http://www.mckinsey.com/business-functions/digital-mckinsey/our-insights/big-data-the-next-frontier-for-innovation
DATA SCIENCE PRINCIPLES
1. Socio-Technical Systems are complex!
2. Data is never at rest
3. Data is dirty, deal with it!
4. SVoT = LOL! (Single Version of Truth)
5. Data munging/wrangling & data wrestling > 70% time – this is the
reality of the data scientist
6. Simplification. Reduction. Distillation.
7. Curiosity. Empricism. Skepticism.
KNOWNS AND UNKNOWNS
There are known knowns. These are things we know that we know.
There are known unknowns. That is to say, there are things that we know
we don’t know.
But there are also unknown unknowns. There are things we don’t know
we
don’t know.
Donald Rumsfeld
DIKUW
APPLICATIONS OF DATA SCIENCE
APPLICATIONS OF DATA SCIENCE
Data-Driven Decision Making (DDD) refers to the practice of basing decisions on
data, rather than purely on intuition.
DataScienceforBusiness.O’ReillyMedia
APPLICATIONS OF DATA SCIENCE
PROCESS FLOW DIAGRAM
APPLICATIONS OF DATA SCIENCE
BUSINESS
APPLICATIONS OF DATA SCIENCE
BUSINESS
APPLICATIONS OF DATA SCIENCE
BUSINESS
APPLICATIONS OF DATA SCIENCE
BUSINESS
APPLICATIONS OF DATA SCIENCE
BUSINESS
APPLICATIONS OF DATA SCIENCE
BUSINESS
APPLICATIONS OF DATA SCIENCE
BUSINESS
APPLICATIONS OF DATA SCIENCE
BUSINESS
APPLICATIONS OF DATA SCIENCE
SPORTS
APPLICATIONS OF DATA SCIENCE
HEALTHCARE
APPLICATIONS OF DATA SCIENCE
RETAIL
APPLICATIONS OF DATA SCIENCE
RETAIL
APPLICATIONS OF DATA SCIENCE
RESEARCH
DATA-DRIVEN ORGANIZATION
Organizations become data-driven by developing data products.
What is a data product?
• Curated and crafted from raw data
• A result of exploration and iterations
• A machine that learns from data
• An answer to known unknowns or unknown unknowns
• A mechanism that triggers immediate business value
• A probabilistic window of future events or behavior
DEVELOPING DATA PRODUCTS
OBJECTIVES
What outcome am I
trying to achieve?
LEVERS
What inputs can we
control?
DATA
What data can we
collect?
MODELS
How the levers
influence the
objectives?
© Tarun Sukhani
DEVELOPING DATA PRODUCTS
THE WORLD
1. Product
Manufactured
2. Goods shipped
3. Product purchased
4. Phone Calls Made
5. Energy Consumed
6. Fraud Committed
7. Repair Requested
8. System
INGEST RAW
DATA
1. Transactions
2. Web-scraping
3. Web-clicks & logs
4. Sensor data
5. Mobile data
6. Docs, Email, XLS
7. Social Feeds, RSS
8. Flume & Sqoop
MUNCH DATA
1. MapReduce
2. ETL/ELT
3. Data Wrangle
4. Data Cleansing
5. Dim. Reduction
6. Sample
7. Select, Join, Bind
THE
DATASET
1. Independency?
2. Correlation?
3. Covariance?
4. Causality?
5. Dimensionality?
6. Missing Values?
7. Relevancy?
1. Known Unknowns?
2. We’d like to know…
3. Outcomes?
4. What data?
5. Hypothesis?
DEVELOPING DATA PRODUCTS
LEARN FROM DATA
1. Description & Inference
2. Data & Algorithm Models
3. Machine Learning
4. Networks & Graphs
5. Regression & Prediction
6. Classification & Clustering
7. Experiments & Iteration
DATA PRODUCT
1. Objectives
2. Levers
3. Modeling
4. Simulation
5. Optimization
6. Visualization
VISUALIZE
INSIGHT
1. Actionable
2. Predictive
3. Immediate Impact
4. Business Value
5. Easy to Explain
DELIVER
INSIGHT
EXPLORE DATATHE
DATASET
REPRESENT DATA
DISCOVER DATA
DEVELOPING DATA PRODUCTS
DATA MODELER SIMULATOR OPTIMIZER
What Outcome Am
I Trying to
Achieve?
Actionable
Outcome
The Model Assembly Line
DATA SCIENCE AS A CAREER
DATA SCIENCE AS A
CAREER
DJ Patil, Chief Data Scientist of the United States
is the perfect prototype of the Data Scientist. He brings a deep understanding of mathematics from
his Ph.D. in applied mathematics. He has created multiple data products, and collaborated with
people in various data science roles. He’s headed up strategy and led teams to build out entire new
extensions of Linkedin’s data, from the creation of “People You May Know”, to Talent Match, a
function that automatically sources the best candidate for any job posted on Linkedin.
Doug Cutting, Creator of Hadoop & Chief Architect at Cloudera
is somebody who has dedicated his time to creating technical solutions to store and process data at
scale. Hadoop is widely used to distribute data across several hardware servers so that huge data
sets can become manageable. Doug Cutting is the prototypical example of a data engineer and he
is now the chief architect at Cloudera, one of the largest data engineering organizations in the world.
DATA SCIENCE EDUCATION FRAMEWORK
LEARN TO CODE
PYTHON R JULIA
HIGH-LEVEL
LOWER-LEVEL
JAVA SCALA/CLOJURE C++/GO
DATA SCIENCE EDUCATION FRAMEWORK
LEARN MATHEMATICS & STATISTICS
MATHEMATICS
STATISTICAL
ANALYSIS
LINEAR ALEGBRA
(MATRIX
FACTORIZATION)
CALCULUS
(INTEGRALS,
DERIVATIVES, ETC)
GRAPH THEORY
PROBABILITY/COMBINAT
ORICS
DISTRIBUTIONS
(BINOMIAL, NORMAL,
POISSON, ETC)
SUMMARY STATISTICS
(MEAN, VARIANCE, ETC)
HYPOTHESIS TESTING
(P-VALUE, CHI-SQUARE, ETC)
BAYESIAN ANALYSIS
DATA SCIENCE EDUCATION FRAMEWORK
LEARN MACHINE LEARNING AND SOFTWARE ENGINEERING
MACHINE
LEARNING
SOFTWARE
ENGINEERING
SUPERVISED
(SVM, RANDOM FOREST)
UNSUPERVISED
(K-MEANS, LDA)
NLP/INFORMATION
RETRIEVAL
VALIDATION, MODEL
COMPARISON
ALGORITHMS & DATA
STRUCTURES
DATA VISUALIZATION
DATA MUNGING/WRANGLING
DISTRIBUTED COMPUTING
DATA SCIENCE EDUCATION FRAMEWORK
YOU DON’T NEED A
PHD TO DO DATA
SCIENCE!
DATA SCIENCE EDUCATION
FRAMEWORK
DATA SCIENCE EDUCATION
FRAMEWORK
DEMO & Q/A

Weitere ähnliche Inhalte

Was ist angesagt?

Architecting a Platform for Enterprise Use - Strata London 2018
Architecting a Platform for Enterprise Use - Strata London 2018Architecting a Platform for Enterprise Use - Strata London 2018
Architecting a Platform for Enterprise Use - Strata London 2018mark madsen
 
Data Architecture: OMG It’s Made of People
Data Architecture: OMG It’s Made of PeopleData Architecture: OMG It’s Made of People
Data Architecture: OMG It’s Made of Peoplemark madsen
 
Smart Data Slides: Data Science and Business Analysis - A Look at Best Practi...
Smart Data Slides: Data Science and Business Analysis - A Look at Best Practi...Smart Data Slides: Data Science and Business Analysis - A Look at Best Practi...
Smart Data Slides: Data Science and Business Analysis - A Look at Best Practi...DATAVERSITY
 
What is a Data Scientist
What is a Data Scientist What is a Data Scientist
What is a Data Scientist Experian_US
 
Keynote - An overview on Big Data & Data Science - Dr Gregory Piatetsky-Shapiro
Keynote -  An overview on Big Data & Data Science - Dr Gregory Piatetsky-ShapiroKeynote -  An overview on Big Data & Data Science - Dr Gregory Piatetsky-Shapiro
Keynote - An overview on Big Data & Data Science - Dr Gregory Piatetsky-ShapiroData ScienceTech Institute
 
Insight white paper_2014
Insight white paper_2014Insight white paper_2014
Insight white paper_2014Lin Todd
 
Top 10 data science takeaways for executives
Top 10 data science takeaways for executivesTop 10 data science takeaways for executives
Top 10 data science takeaways for executivesDylan Erens
 
The Black Box: Interpretability, Reproducibility, and Data Management
The Black Box: Interpretability, Reproducibility, and Data ManagementThe Black Box: Interpretability, Reproducibility, and Data Management
The Black Box: Interpretability, Reproducibility, and Data Managementmark madsen
 
From Rocket Science to Data Science
From Rocket Science to Data ScienceFrom Rocket Science to Data Science
From Rocket Science to Data ScienceSanghamitra Deb
 
A data view of the data science process
A data view of the data science processA data view of the data science process
A data view of the data science processMathieu d'Aquin
 
data scientist the sexiest job of the 21st century
data scientist the sexiest job of the 21st centurydata scientist the sexiest job of the 21st century
data scientist the sexiest job of the 21st centuryFrank Kienle
 
Lessons Learned The Hard Way: 32+ Data Science Interviews
Lessons Learned The Hard Way: 32+ Data Science InterviewsLessons Learned The Hard Way: 32+ Data Science Interviews
Lessons Learned The Hard Way: 32+ Data Science InterviewsGregory Kamradt
 
Idiots guide to setting up a data science team
Idiots guide to setting up a data science teamIdiots guide to setting up a data science team
Idiots guide to setting up a data science teamAshish Bansal
 
Booz Allen Hamilton's Data Science Infographic
Booz Allen Hamilton's Data Science InfographicBooz Allen Hamilton's Data Science Infographic
Booz Allen Hamilton's Data Science InfographicBooz Allen Hamilton
 
YHORG Presentation 23 February 2016
YHORG Presentation 23 February 2016YHORG Presentation 23 February 2016
YHORG Presentation 23 February 2016Richard Vidgen
 
Operationalizing Machine Learning in the Enterprise
Operationalizing Machine Learning in the EnterpriseOperationalizing Machine Learning in the Enterprise
Operationalizing Machine Learning in the Enterprisemark madsen
 
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactData Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactDr. Sunil Kr. Pandey
 
Data Science: An Emerging Field for Future Jobs
Data Science: An Emerging Field for Future JobsData Science: An Emerging Field for Future Jobs
Data Science: An Emerging Field for Future JobsJian Qin
 
Intro to Data Science for Non-Data Scientists
Intro to Data Science for Non-Data ScientistsIntro to Data Science for Non-Data Scientists
Intro to Data Science for Non-Data ScientistsSri Ambati
 

Was ist angesagt? (20)

Architecting a Platform for Enterprise Use - Strata London 2018
Architecting a Platform for Enterprise Use - Strata London 2018Architecting a Platform for Enterprise Use - Strata London 2018
Architecting a Platform for Enterprise Use - Strata London 2018
 
Data Architecture: OMG It’s Made of People
Data Architecture: OMG It’s Made of PeopleData Architecture: OMG It’s Made of People
Data Architecture: OMG It’s Made of People
 
Smart Data Slides: Data Science and Business Analysis - A Look at Best Practi...
Smart Data Slides: Data Science and Business Analysis - A Look at Best Practi...Smart Data Slides: Data Science and Business Analysis - A Look at Best Practi...
Smart Data Slides: Data Science and Business Analysis - A Look at Best Practi...
 
What is a Data Scientist
What is a Data Scientist What is a Data Scientist
What is a Data Scientist
 
Keynote - An overview on Big Data & Data Science - Dr Gregory Piatetsky-Shapiro
Keynote -  An overview on Big Data & Data Science - Dr Gregory Piatetsky-ShapiroKeynote -  An overview on Big Data & Data Science - Dr Gregory Piatetsky-Shapiro
Keynote - An overview on Big Data & Data Science - Dr Gregory Piatetsky-Shapiro
 
Insight white paper_2014
Insight white paper_2014Insight white paper_2014
Insight white paper_2014
 
Top 10 data science takeaways for executives
Top 10 data science takeaways for executivesTop 10 data science takeaways for executives
Top 10 data science takeaways for executives
 
The Black Box: Interpretability, Reproducibility, and Data Management
The Black Box: Interpretability, Reproducibility, and Data ManagementThe Black Box: Interpretability, Reproducibility, and Data Management
The Black Box: Interpretability, Reproducibility, and Data Management
 
From Rocket Science to Data Science
From Rocket Science to Data ScienceFrom Rocket Science to Data Science
From Rocket Science to Data Science
 
A data view of the data science process
A data view of the data science processA data view of the data science process
A data view of the data science process
 
data scientist the sexiest job of the 21st century
data scientist the sexiest job of the 21st centurydata scientist the sexiest job of the 21st century
data scientist the sexiest job of the 21st century
 
Lessons Learned The Hard Way: 32+ Data Science Interviews
Lessons Learned The Hard Way: 32+ Data Science InterviewsLessons Learned The Hard Way: 32+ Data Science Interviews
Lessons Learned The Hard Way: 32+ Data Science Interviews
 
Idiots guide to setting up a data science team
Idiots guide to setting up a data science teamIdiots guide to setting up a data science team
Idiots guide to setting up a data science team
 
Booz Allen Hamilton's Data Science Infographic
Booz Allen Hamilton's Data Science InfographicBooz Allen Hamilton's Data Science Infographic
Booz Allen Hamilton's Data Science Infographic
 
YHORG Presentation 23 February 2016
YHORG Presentation 23 February 2016YHORG Presentation 23 February 2016
YHORG Presentation 23 February 2016
 
Operationalizing Machine Learning in the Enterprise
Operationalizing Machine Learning in the EnterpriseOperationalizing Machine Learning in the Enterprise
Operationalizing Machine Learning in the Enterprise
 
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactData Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
 
Data Science: Past, Present, and Future
Data Science: Past, Present, and FutureData Science: Past, Present, and Future
Data Science: Past, Present, and Future
 
Data Science: An Emerging Field for Future Jobs
Data Science: An Emerging Field for Future JobsData Science: An Emerging Field for Future Jobs
Data Science: An Emerging Field for Future Jobs
 
Intro to Data Science for Non-Data Scientists
Intro to Data Science for Non-Data ScientistsIntro to Data Science for Non-Data Scientists
Intro to Data Science for Non-Data Scientists
 

Ähnlich wie iTrain Malaysia: Data Science by Tarun Sukhani

Digicrome Student Hand Book
Digicrome Student Hand BookDigicrome Student Hand Book
Digicrome Student Hand BookAayushdigichrome
 
What does it_takes_to_be_a_good_data_scientist_2019_aim_simplilearn
What does it_takes_to_be_a_good_data_scientist_2019_aim_simplilearnWhat does it_takes_to_be_a_good_data_scientist_2019_aim_simplilearn
What does it_takes_to_be_a_good_data_scientist_2019_aim_simplilearnPraj H
 
OVERVIEW OF DATA SCIENCE (3).pdf
OVERVIEW OF DATA SCIENCE (3).pdfOVERVIEW OF DATA SCIENCE (3).pdf
OVERVIEW OF DATA SCIENCE (3).pdfcareer tech
 
Welcome to Data Science
Welcome to Data ScienceWelcome to Data Science
Welcome to Data ScienceNyraSehgal
 
Thinkful DC - Intro to Data Science
Thinkful DC - Intro to Data Science Thinkful DC - Intro to Data Science
Thinkful DC - Intro to Data Science TJ Stalcup
 
Intro to Data Science
Intro to Data ScienceIntro to Data Science
Intro to Data ScienceTJ Stalcup
 
2017 06-14-getting started with data science
2017 06-14-getting started with data science2017 06-14-getting started with data science
2017 06-14-getting started with data scienceThinkful
 
What is data_science_by_khawar_shehzad
What is data_science_by_khawar_shehzadWhat is data_science_by_khawar_shehzad
What is data_science_by_khawar_shehzadKhawarShehzadMahaar
 
Ch1IntroductiontoDataScience.pptx
Ch1IntroductiontoDataScience.pptxCh1IntroductiontoDataScience.pptx
Ch1IntroductiontoDataScience.pptxAbderrahmanABID2
 
Embracing data science
Embracing data scienceEmbracing data science
Embracing data scienceVipul Kalamkar
 
Come diventare data scientist - Paolo Pellegrini
Come diventare data scientist - Paolo PellegriniCome diventare data scientist - Paolo Pellegrini
Come diventare data scientist - Paolo PellegriniDonatella Cambosu
 
Who is a data scientist
Who is a data scientist  Who is a data scientist
Who is a data scientist prateek kumar
 
ABOUT DATA SCIENCE big data analytics ppt.pptx
ABOUT DATA SCIENCE big data analytics ppt.pptxABOUT DATA SCIENCE big data analytics ppt.pptx
ABOUT DATA SCIENCE big data analytics ppt.pptxVASANTHIG10
 
intro to data science Clustering and visualization of data science subfields ...
intro to data science Clustering and visualization of data science subfields ...intro to data science Clustering and visualization of data science subfields ...
intro to data science Clustering and visualization of data science subfields ...jybufgofasfbkpoovh
 
An Overview of BigData
An Overview of BigDataAn Overview of BigData
An Overview of BigDataValarmathi V
 
Drinking from the Fire Hose: Practical Approaches to Big Data Preparation and...
Drinking from the Fire Hose: Practical Approaches to Big Data Preparation and...Drinking from the Fire Hose: Practical Approaches to Big Data Preparation and...
Drinking from the Fire Hose: Practical Approaches to Big Data Preparation and...Inside Analysis
 
The Role of Data Wrangling in Driving Hadoop Adoption
The Role of Data Wrangling in Driving Hadoop AdoptionThe Role of Data Wrangling in Driving Hadoop Adoption
The Role of Data Wrangling in Driving Hadoop AdoptionInside Analysis
 
Colloquium(7)_DataScience:ShivShaktiGhosh&MohitGarg
Colloquium(7)_DataScience:ShivShaktiGhosh&MohitGargColloquium(7)_DataScience:ShivShaktiGhosh&MohitGarg
Colloquium(7)_DataScience:ShivShaktiGhosh&MohitGargShiv Shakti Ghosh
 

Ähnlich wie iTrain Malaysia: Data Science by Tarun Sukhani (20)

Untitled document.pdf
Untitled document.pdfUntitled document.pdf
Untitled document.pdf
 
Digicrome Student Hand Book
Digicrome Student Hand BookDigicrome Student Hand Book
Digicrome Student Hand Book
 
What does it_takes_to_be_a_good_data_scientist_2019_aim_simplilearn
What does it_takes_to_be_a_good_data_scientist_2019_aim_simplilearnWhat does it_takes_to_be_a_good_data_scientist_2019_aim_simplilearn
What does it_takes_to_be_a_good_data_scientist_2019_aim_simplilearn
 
OVERVIEW OF DATA SCIENCE (3).pdf
OVERVIEW OF DATA SCIENCE (3).pdfOVERVIEW OF DATA SCIENCE (3).pdf
OVERVIEW OF DATA SCIENCE (3).pdf
 
Welcome to Data Science
Welcome to Data ScienceWelcome to Data Science
Welcome to Data Science
 
Thinkful DC - Intro to Data Science
Thinkful DC - Intro to Data Science Thinkful DC - Intro to Data Science
Thinkful DC - Intro to Data Science
 
Intro to Data Science
Intro to Data ScienceIntro to Data Science
Intro to Data Science
 
2017 06-14-getting started with data science
2017 06-14-getting started with data science2017 06-14-getting started with data science
2017 06-14-getting started with data science
 
What is data_science_by_khawar_shehzad
What is data_science_by_khawar_shehzadWhat is data_science_by_khawar_shehzad
What is data_science_by_khawar_shehzad
 
Ch1IntroductiontoDataScience.pptx
Ch1IntroductiontoDataScience.pptxCh1IntroductiontoDataScience.pptx
Ch1IntroductiontoDataScience.pptx
 
Embracing data science
Embracing data scienceEmbracing data science
Embracing data science
 
Come diventare data scientist - Paolo Pellegrini
Come diventare data scientist - Paolo PellegriniCome diventare data scientist - Paolo Pellegrini
Come diventare data scientist - Paolo Pellegrini
 
Who is a data scientist
Who is a data scientist  Who is a data scientist
Who is a data scientist
 
ABOUT DATA SCIENCE big data analytics ppt.pptx
ABOUT DATA SCIENCE big data analytics ppt.pptxABOUT DATA SCIENCE big data analytics ppt.pptx
ABOUT DATA SCIENCE big data analytics ppt.pptx
 
intro to data science Clustering and visualization of data science subfields ...
intro to data science Clustering and visualization of data science subfields ...intro to data science Clustering and visualization of data science subfields ...
intro to data science Clustering and visualization of data science subfields ...
 
365 Data Science
365 Data Science365 Data Science
365 Data Science
 
An Overview of BigData
An Overview of BigDataAn Overview of BigData
An Overview of BigData
 
Drinking from the Fire Hose: Practical Approaches to Big Data Preparation and...
Drinking from the Fire Hose: Practical Approaches to Big Data Preparation and...Drinking from the Fire Hose: Practical Approaches to Big Data Preparation and...
Drinking from the Fire Hose: Practical Approaches to Big Data Preparation and...
 
The Role of Data Wrangling in Driving Hadoop Adoption
The Role of Data Wrangling in Driving Hadoop AdoptionThe Role of Data Wrangling in Driving Hadoop Adoption
The Role of Data Wrangling in Driving Hadoop Adoption
 
Colloquium(7)_DataScience:ShivShaktiGhosh&MohitGarg
Colloquium(7)_DataScience:ShivShaktiGhosh&MohitGargColloquium(7)_DataScience:ShivShaktiGhosh&MohitGarg
Colloquium(7)_DataScience:ShivShaktiGhosh&MohitGarg
 

Kürzlich hochgeladen

100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改atducpo
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023ymrp368
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 

Kürzlich hochgeladen (20)

100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 

iTrain Malaysia: Data Science by Tarun Sukhani

  • 1. Data Science APPLICATION AND OPPORTUNITY Prepared By: Tarun Sukhani
  • 2. WHAT IS DATA SCIENCE & BIG DATA? Data Science is an interdisciplinary field that combines statistics, computer science, and operations research. It has numerous applications such as in Fintech, Genomics, and even the Social Sciences, just to name a few. Big Data is data science applied to large data sets, usually in the terabyte range and above. It has its roots in Web 2.0 which emphasized user-generated content, thus resulting in greater variety, volume, and velocity of data.
  • 4. BIG DATA – THE 4 V’S
  • 5. BIG DATA – UNPRECEDENTED GROWTH
  • 6. WHAT IS A DATA SCIENTIST?
  • 7. DATA SCIENCE VENN DIAGRAM Hacking Skills Having a proper mathematical background and domain expertise may not be sufficient to succeed as a data scientist. The ability to combine together Different tools and visualizations is key to becoming an effective data scientist. Math & Statistics Computer Science, Math, Statistics, and Linear Algebra provide a solid foundation from which a data scientist can draw the necessary knowledge to apply analysis to data sets. SME & Job Experience There is no substitute for solid work experience as a business analyst, programmer, and/or statistician for the domain in which you are applying your skills and knowledge. The absence of such experience can lead to biased statistical models or irrelevant conclusions.
  • 8. WHAT DOES A GOOD DATA SCIENTIST LOOK LIKE? Inquisitive – skeptical and curious Knowledgeable – knows machine learning, statistics, and probability Scientific Method – Creates hypotheses, tests them, and updates understanding Coding – is good at coding, hacking, and general programming Product Oriented – knows how to build data products and visualizations to make data understandable to mere mortals Domain Knowledge – understands the business and how to tell the relevant story from business data. Able to find answers to known unknowns.
  • 9. T-SHAPED SKILLSET Broad-range Generalist DeepExpertise Machine Learning, Statistics, Domain Knowledge
  • 12. DEMAND & OPPORTUNITY Data Science has been dubbed by the Harvard Business Review (Thomas H. Davenport and D.J. Patil, October 2012) as… “The Sexiest Job of the 21st Century” https://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century And by the New York Times (April 11, 2013) as a… “hot new field [that] promises to revolutionize industries from business to government, healthcare to academia” Data Science, however, is NOT NEW! It’s basically just data mining rebranded.
  • 13. DEMAND & OPPORTUNITY Data Scientist was identified by Glassdoor as the top job for Work-Life Balance in 2015 (out of 25), with the highest salary…(in USA) 1. Data Scientist • Work-Life Balance Rating: 4.2 (out of 5) • Salary: $114,808 (highest salary) • Number of Job Openings: 1,315 (highest in the top 9) https://www.glassdoor.com/blog/25-jobs-worklife-balance-2015/ According to McKinsey, there will be a shortage of talent needed to take advantage of data science and big data. By 2018, The USA alone could face a shortage of 140-190k skilled data scientists and 1.5 million managers and analysts with the know-how to use the analysis of big data to make effective decisions. http://www.mckinsey.com/business-functions/digital-mckinsey/our-insights/big-data-the-next-frontier-for-innovation
  • 14. DATA SCIENCE PRINCIPLES 1. Socio-Technical Systems are complex! 2. Data is never at rest 3. Data is dirty, deal with it! 4. SVoT = LOL! (Single Version of Truth) 5. Data munging/wrangling & data wrestling > 70% time – this is the reality of the data scientist 6. Simplification. Reduction. Distillation. 7. Curiosity. Empricism. Skepticism.
  • 15. KNOWNS AND UNKNOWNS There are known knowns. These are things we know that we know. There are known unknowns. That is to say, there are things that we know we don’t know. But there are also unknown unknowns. There are things we don’t know we don’t know. Donald Rumsfeld
  • 16. DIKUW
  • 18. APPLICATIONS OF DATA SCIENCE Data-Driven Decision Making (DDD) refers to the practice of basing decisions on data, rather than purely on intuition. DataScienceforBusiness.O’ReillyMedia
  • 19. APPLICATIONS OF DATA SCIENCE PROCESS FLOW DIAGRAM
  • 20. APPLICATIONS OF DATA SCIENCE BUSINESS
  • 21. APPLICATIONS OF DATA SCIENCE BUSINESS
  • 22. APPLICATIONS OF DATA SCIENCE BUSINESS
  • 23. APPLICATIONS OF DATA SCIENCE BUSINESS
  • 24. APPLICATIONS OF DATA SCIENCE BUSINESS
  • 25. APPLICATIONS OF DATA SCIENCE BUSINESS
  • 26. APPLICATIONS OF DATA SCIENCE BUSINESS
  • 27. APPLICATIONS OF DATA SCIENCE BUSINESS
  • 28. APPLICATIONS OF DATA SCIENCE SPORTS
  • 29. APPLICATIONS OF DATA SCIENCE HEALTHCARE
  • 30. APPLICATIONS OF DATA SCIENCE RETAIL
  • 31. APPLICATIONS OF DATA SCIENCE RETAIL
  • 32. APPLICATIONS OF DATA SCIENCE RESEARCH
  • 33. DATA-DRIVEN ORGANIZATION Organizations become data-driven by developing data products. What is a data product? • Curated and crafted from raw data • A result of exploration and iterations • A machine that learns from data • An answer to known unknowns or unknown unknowns • A mechanism that triggers immediate business value • A probabilistic window of future events or behavior
  • 34. DEVELOPING DATA PRODUCTS OBJECTIVES What outcome am I trying to achieve? LEVERS What inputs can we control? DATA What data can we collect? MODELS How the levers influence the objectives?
  • 35. © Tarun Sukhani DEVELOPING DATA PRODUCTS THE WORLD 1. Product Manufactured 2. Goods shipped 3. Product purchased 4. Phone Calls Made 5. Energy Consumed 6. Fraud Committed 7. Repair Requested 8. System INGEST RAW DATA 1. Transactions 2. Web-scraping 3. Web-clicks & logs 4. Sensor data 5. Mobile data 6. Docs, Email, XLS 7. Social Feeds, RSS 8. Flume & Sqoop MUNCH DATA 1. MapReduce 2. ETL/ELT 3. Data Wrangle 4. Data Cleansing 5. Dim. Reduction 6. Sample 7. Select, Join, Bind THE DATASET 1. Independency? 2. Correlation? 3. Covariance? 4. Causality? 5. Dimensionality? 6. Missing Values? 7. Relevancy? 1. Known Unknowns? 2. We’d like to know… 3. Outcomes? 4. What data? 5. Hypothesis?
  • 36. DEVELOPING DATA PRODUCTS LEARN FROM DATA 1. Description & Inference 2. Data & Algorithm Models 3. Machine Learning 4. Networks & Graphs 5. Regression & Prediction 6. Classification & Clustering 7. Experiments & Iteration DATA PRODUCT 1. Objectives 2. Levers 3. Modeling 4. Simulation 5. Optimization 6. Visualization VISUALIZE INSIGHT 1. Actionable 2. Predictive 3. Immediate Impact 4. Business Value 5. Easy to Explain DELIVER INSIGHT EXPLORE DATATHE DATASET REPRESENT DATA DISCOVER DATA
  • 37. DEVELOPING DATA PRODUCTS DATA MODELER SIMULATOR OPTIMIZER What Outcome Am I Trying to Achieve? Actionable Outcome The Model Assembly Line
  • 38. DATA SCIENCE AS A CAREER
  • 39. DATA SCIENCE AS A CAREER DJ Patil, Chief Data Scientist of the United States is the perfect prototype of the Data Scientist. He brings a deep understanding of mathematics from his Ph.D. in applied mathematics. He has created multiple data products, and collaborated with people in various data science roles. He’s headed up strategy and led teams to build out entire new extensions of Linkedin’s data, from the creation of “People You May Know”, to Talent Match, a function that automatically sources the best candidate for any job posted on Linkedin. Doug Cutting, Creator of Hadoop & Chief Architect at Cloudera is somebody who has dedicated his time to creating technical solutions to store and process data at scale. Hadoop is widely used to distribute data across several hardware servers so that huge data sets can become manageable. Doug Cutting is the prototypical example of a data engineer and he is now the chief architect at Cloudera, one of the largest data engineering organizations in the world.
  • 40. DATA SCIENCE EDUCATION FRAMEWORK LEARN TO CODE PYTHON R JULIA HIGH-LEVEL LOWER-LEVEL JAVA SCALA/CLOJURE C++/GO
  • 41. DATA SCIENCE EDUCATION FRAMEWORK LEARN MATHEMATICS & STATISTICS MATHEMATICS STATISTICAL ANALYSIS LINEAR ALEGBRA (MATRIX FACTORIZATION) CALCULUS (INTEGRALS, DERIVATIVES, ETC) GRAPH THEORY PROBABILITY/COMBINAT ORICS DISTRIBUTIONS (BINOMIAL, NORMAL, POISSON, ETC) SUMMARY STATISTICS (MEAN, VARIANCE, ETC) HYPOTHESIS TESTING (P-VALUE, CHI-SQUARE, ETC) BAYESIAN ANALYSIS
  • 42. DATA SCIENCE EDUCATION FRAMEWORK LEARN MACHINE LEARNING AND SOFTWARE ENGINEERING MACHINE LEARNING SOFTWARE ENGINEERING SUPERVISED (SVM, RANDOM FOREST) UNSUPERVISED (K-MEANS, LDA) NLP/INFORMATION RETRIEVAL VALIDATION, MODEL COMPARISON ALGORITHMS & DATA STRUCTURES DATA VISUALIZATION DATA MUNGING/WRANGLING DISTRIBUTED COMPUTING
  • 43. DATA SCIENCE EDUCATION FRAMEWORK YOU DON’T NEED A PHD TO DO DATA SCIENCE!