SlideShare ist ein Scribd-Unternehmen logo
1 von 12
Downloaden Sie, um offline zu lesen
Data Science Intro
Regress
[v. ri-gress]
1. to move backward
Albert Anthony D. Gavino, Data Scientist
Topics
CRISP DM model
Phases of CRISP-DM
History of Data Mining
CRISP-DM 1.0
Phases of CRISP-DM 1.0
Reverend Thomas Bayes (1763)
Reverend Thomas Bayes
was an English statistician,
philosopher and Presbyterian
minister who is known for
having formulated a specific
case of the theorem that
bears his name: Bayes'
theorem.
1763: paper was published
Application Real Case: Uber
Priors used by Uber:
Rider Prior: this prior is about the user
Popular place prior: restaurants, night life,museums
Uber Prior: all uber riders go to certain places
Legendre & Gauss (1805)
1805 Adrien-Marie Legendre and Carl Friedrich
Gauss applied regression to determine the
orbits of bodies about the Sun. Hence the
method of least squares (for computing the
unknown parameters in the general regression
model)
Alan Turing (1936)
An "effectively computable" procedure is
supposed to be one that can be performed by
systematic application of clearly specified rules,
without requiring any inspirational leaps or
spontaneous intellectual insights
Hirotugu Aikaike & the AIC (1974)
November 5, 1927 – August 4,
2009) Hirotugu was a Japanese
statistician. In the early 1970s he
formulated a criterion for model
selection—the Akaike information
criterion, which he thought of
while riding the train.
“On the morning of March 16, 1971, while
taking a seat in a commuter train, I suddenly
realized that the parameters of the factor
analysis model were estimated by
maximizing the likelihood and that the
mean value of the logarithmus of the
likelihood was connected with the Kullback-
Leibler information number. This was the
quantity that was to replace the mean
squared error of prediction”
Data Science History
1943 McCulloch and Pitts wrote , they describe the idea of a neuron in a network. Each of
these neurons can do 3 things: receive inputs, process inputs and generate output.
1989 The term “Knowledge Discovery in Databases” (KDD) is coined by . It was also
at this time that he co-founded the also named .
1990s The term “data mining” appeared in the database community. Retail companies
and the financial community are using data mining to analyze data and recognize trends
to increase their customer base.
1992 Boser, Guyon and Vapnik suggested an improvement on the original support
vector machine which allows for the creation of nonlinear classifiers. are a supervised
learning approach that analyzes data and recognizes patterns used for classification
and regression analysis.
2001 Although the term has existed since 1960s, it wasn’t until 2001 that William S.
Cleveland it as an independent discipline. As per , DJ Patil and Jeff Hammerbacher then
used the term to describe their roles at LinkedIn and Facebook.
Anthony Goldbloom, Kaggle (2010)
Goldbloom (1983) born in Australia, founded
Kaggle in 2010 as a Silicon Valley Startup that
focused on predictive analytics.
Andrew Ng of Baidu, Coursera
(2011)
Andrew Yan-Tak Ng (born 1976) is Chief Scientist
at Baidu Research in Silicon Valley. In addition,
he is an associate professor in the Department of
Computer Science at Stanford University. He is
chairman of the board of Coursera, an online
education platform that provides data science
courses online.
In 2011, Ng founded the Google Brain which developed very large
scale artificial neural networks using Google's distributed computer
infrastructure. Among its notable results was a neural network trained
using deep learning algorithms that learned to recognize cats after
watching only YouTube videos.
Branches of Data Science
Natural Language Processing (NLP)
Deep Learning
Predictive Analytics
Text Analytics
Social Media Analytics
Image Processing

Weitere ähnliche Inhalte

Andere mochten auch

Andere mochten auch (18)

Encuesta a usuarios de abp
Encuesta a usuarios de abpEncuesta a usuarios de abp
Encuesta a usuarios de abp
 
王永慶的精神
王永慶的精神王永慶的精神
王永慶的精神
 
Calendario fruta corregido
Calendario fruta corregidoCalendario fruta corregido
Calendario fruta corregido
 
Mahindra Susten - Road Safety Week
Mahindra Susten - Road Safety Week   Mahindra Susten - Road Safety Week
Mahindra Susten - Road Safety Week
 
PsycBITE
PsycBITEPsycBITE
PsycBITE
 
trafluence
trafluencetrafluence
trafluence
 
Avanzar
AvanzarAvanzar
Avanzar
 
Social_Good_DSCON
Social_Good_DSCONSocial_Good_DSCON
Social_Good_DSCON
 
CSB_community
CSB_communityCSB_community
CSB_community
 
Big Data for Library Services (2017)
Big Data for Library Services (2017)Big Data for Library Services (2017)
Big Data for Library Services (2017)
 
Ipc簡介
Ipc簡介Ipc簡介
Ipc簡介
 
Nine Pages You Should Optimize on Your Blog and How
Nine Pages You Should Optimize on Your Blog and HowNine Pages You Should Optimize on Your Blog and How
Nine Pages You Should Optimize on Your Blog and How
 
Recovery: Job Growth and Education Requirements Through 2020
Recovery: Job Growth and Education Requirements Through 2020Recovery: Job Growth and Education Requirements Through 2020
Recovery: Job Growth and Education Requirements Through 2020
 
African Americans: College Majors and Earnings
African Americans: College Majors and Earnings African Americans: College Majors and Earnings
African Americans: College Majors and Earnings
 
The Online College Labor Market
The Online College Labor MarketThe Online College Labor Market
The Online College Labor Market
 
What's Trending in Talent and Learning for 2016?
What's Trending in Talent and Learning for 2016?What's Trending in Talent and Learning for 2016?
What's Trending in Talent and Learning for 2016?
 
GAME ON! Integrating Games and Simulations in the Classroom
GAME ON! Integrating Games and Simulations in the Classroom GAME ON! Integrating Games and Simulations in the Classroom
GAME ON! Integrating Games and Simulations in the Classroom
 
Digitized Student Development, Social Media, and Identity
Digitized Student Development, Social Media, and IdentityDigitized Student Development, Social Media, and Identity
Digitized Student Development, Social Media, and Identity
 

Ähnlich wie introduction_aikaike

Trading cards of database pioneers - incomplete *DRAFT*
Trading cards of database pioneers - incomplete *DRAFT*Trading cards of database pioneers - incomplete *DRAFT*
Trading cards of database pioneers - incomplete *DRAFT*Damian T. Gordon
 
Bi(G) data: opportunities for BI Professionals
Bi(G) data: opportunities for BI ProfessionalsBi(G) data: opportunities for BI Professionals
Bi(G) data: opportunities for BI ProfessionalsAlbert Besselse
 
Module 1 Introduction to Big and Smart Data- Online
Module 1 Introduction to Big and Smart Data- Online Module 1 Introduction to Big and Smart Data- Online
Module 1 Introduction to Big and Smart Data- Online caniceconsulting
 
Applying data visualisation to the analytics process
Applying data visualisation to the analytics processApplying data visualisation to the analytics process
Applying data visualisation to the analytics processCasper Blicher Olsen
 
Brief History Of Big Data
Brief History Of Big DataBrief History Of Big Data
Brief History Of Big DataTyrone Systems
 
History of Big Data
History of Big DataHistory of Big Data
History of Big DataHEXANIKA
 
Quora ML Workshop: Engineering at the Intersection of Productive Efficiency, ...
Quora ML Workshop: Engineering at the Intersection of Productive Efficiency, ...Quora ML Workshop: Engineering at the Intersection of Productive Efficiency, ...
Quora ML Workshop: Engineering at the Intersection of Productive Efficiency, ...Quora
 
Abhishek mba project_report
Abhishek mba project_reportAbhishek mba project_report
Abhishek mba project_reportDBanerjee3
 
Introducing Geodesign: The Concept
Introducing Geodesign: The ConceptIntroducing Geodesign: The Concept
Introducing Geodesign: The ConceptEsri
 

Ähnlich wie introduction_aikaike (20)

Trading cards of database pioneers - incomplete *DRAFT*
Trading cards of database pioneers - incomplete *DRAFT*Trading cards of database pioneers - incomplete *DRAFT*
Trading cards of database pioneers - incomplete *DRAFT*
 
DATA ANALYSIS
DATA ANALYSISDATA ANALYSIS
DATA ANALYSIS
 
Shaastra Main Quiz 2016
Shaastra Main Quiz 2016Shaastra Main Quiz 2016
Shaastra Main Quiz 2016
 
Bi(G) data: opportunities for BI Professionals
Bi(G) data: opportunities for BI ProfessionalsBi(G) data: opportunities for BI Professionals
Bi(G) data: opportunities for BI Professionals
 
Module 1 Introduction to Big and Smart Data- Online
Module 1 Introduction to Big and Smart Data- Online Module 1 Introduction to Big and Smart Data- Online
Module 1 Introduction to Big and Smart Data- Online
 
Applying data visualisation to the analytics process
Applying data visualisation to the analytics processApplying data visualisation to the analytics process
Applying data visualisation to the analytics process
 
Brief History Of Big Data
Brief History Of Big DataBrief History Of Big Data
Brief History Of Big Data
 
History of Big Data
History of Big DataHistory of Big Data
History of Big Data
 
Farewell Quiz
Farewell QuizFarewell Quiz
Farewell Quiz
 
Farewell Quiz
Farewell QuizFarewell Quiz
Farewell Quiz
 
Farewell Quiz
Farewell QuizFarewell Quiz
Farewell Quiz
 
Farewell Quiz
Farewell QuizFarewell Quiz
Farewell Quiz
 
Quora ML Workshop: Engineering at the Intersection of Productive Efficiency, ...
Quora ML Workshop: Engineering at the Intersection of Productive Efficiency, ...Quora ML Workshop: Engineering at the Intersection of Productive Efficiency, ...
Quora ML Workshop: Engineering at the Intersection of Productive Efficiency, ...
 
It\'s Your Move
It\'s Your MoveIt\'s Your Move
It\'s Your Move
 
Abhishek mba project_report
Abhishek mba project_reportAbhishek mba project_report
Abhishek mba project_report
 
MBA Project Report
MBA Project ReportMBA Project Report
MBA Project Report
 
Tech quiz prelims
Tech quiz prelimsTech quiz prelims
Tech quiz prelims
 
Tech quiz prelims
Tech quiz prelimsTech quiz prelims
Tech quiz prelims
 
Introducing Geodesign: The Concept
Introducing Geodesign: The ConceptIntroducing Geodesign: The Concept
Introducing Geodesign: The Concept
 
History of GIS.pptx
History of GIS.pptxHistory of GIS.pptx
History of GIS.pptx
 

introduction_aikaike

  • 1. Data Science Intro Regress [v. ri-gress] 1. to move backward Albert Anthony D. Gavino, Data Scientist
  • 2. Topics CRISP DM model Phases of CRISP-DM History of Data Mining
  • 5. Reverend Thomas Bayes (1763) Reverend Thomas Bayes was an English statistician, philosopher and Presbyterian minister who is known for having formulated a specific case of the theorem that bears his name: Bayes' theorem. 1763: paper was published Application Real Case: Uber Priors used by Uber: Rider Prior: this prior is about the user Popular place prior: restaurants, night life,museums Uber Prior: all uber riders go to certain places
  • 6. Legendre & Gauss (1805) 1805 Adrien-Marie Legendre and Carl Friedrich Gauss applied regression to determine the orbits of bodies about the Sun. Hence the method of least squares (for computing the unknown parameters in the general regression model)
  • 7. Alan Turing (1936) An "effectively computable" procedure is supposed to be one that can be performed by systematic application of clearly specified rules, without requiring any inspirational leaps or spontaneous intellectual insights
  • 8. Hirotugu Aikaike & the AIC (1974) November 5, 1927 – August 4, 2009) Hirotugu was a Japanese statistician. In the early 1970s he formulated a criterion for model selection—the Akaike information criterion, which he thought of while riding the train. “On the morning of March 16, 1971, while taking a seat in a commuter train, I suddenly realized that the parameters of the factor analysis model were estimated by maximizing the likelihood and that the mean value of the logarithmus of the likelihood was connected with the Kullback- Leibler information number. This was the quantity that was to replace the mean squared error of prediction”
  • 9. Data Science History 1943 McCulloch and Pitts wrote , they describe the idea of a neuron in a network. Each of these neurons can do 3 things: receive inputs, process inputs and generate output. 1989 The term “Knowledge Discovery in Databases” (KDD) is coined by . It was also at this time that he co-founded the also named . 1990s The term “data mining” appeared in the database community. Retail companies and the financial community are using data mining to analyze data and recognize trends to increase their customer base. 1992 Boser, Guyon and Vapnik suggested an improvement on the original support vector machine which allows for the creation of nonlinear classifiers. are a supervised learning approach that analyzes data and recognizes patterns used for classification and regression analysis. 2001 Although the term has existed since 1960s, it wasn’t until 2001 that William S. Cleveland it as an independent discipline. As per , DJ Patil and Jeff Hammerbacher then used the term to describe their roles at LinkedIn and Facebook.
  • 10. Anthony Goldbloom, Kaggle (2010) Goldbloom (1983) born in Australia, founded Kaggle in 2010 as a Silicon Valley Startup that focused on predictive analytics.
  • 11. Andrew Ng of Baidu, Coursera (2011) Andrew Yan-Tak Ng (born 1976) is Chief Scientist at Baidu Research in Silicon Valley. In addition, he is an associate professor in the Department of Computer Science at Stanford University. He is chairman of the board of Coursera, an online education platform that provides data science courses online. In 2011, Ng founded the Google Brain which developed very large scale artificial neural networks using Google's distributed computer infrastructure. Among its notable results was a neural network trained using deep learning algorithms that learned to recognize cats after watching only YouTube videos.
  • 12. Branches of Data Science Natural Language Processing (NLP) Deep Learning Predictive Analytics Text Analytics Social Media Analytics Image Processing