SlideShare ist ein Scribd-Unternehmen logo
1 von 79
Downloaden Sie, um offline zu lesen
Data-driven modeling
                                           APAM E4990


                                           Jake Hofman

                                          Columbia University


                                         January 23, 2012




Jake Hofman   (Columbia University)        Data-driven modeling   January 23, 2012   1 / 19
Data-dependent products



              • Effective/practical systems that learn from experience impact
                our daily lives, e.g.:
                    •   Recommendation systems
                    •   Spam detection
                    •   Optical character recognition
                    •   Face recognition
                    •   Fraud detection
                    •   Machine translation
                    •   ...




Jake Hofman    (Columbia University)       Data-driven modeling   January 23, 2012   2 / 19
Learning by example




Jake Hofman   (Columbia University)   Data-driven modeling   January 23, 2012   3 / 19
Learning by example




              • How did you solve this problem?
              • Can you make this process explicit (e.g. write code to do so)?


Jake Hofman    (Columbia University)   Data-driven modeling        January 23, 2012   3 / 19
Learning by example




              • We learn quickly from few, relatively unstructured examples ...
                but we don’t understand how we accomplish this
              • We’d like to develop algorithms that enable machines to learn
                by example from large data sets

Jake Hofman    (Columbia University)   Data-driven modeling         January 23, 2012   4 / 19
Got data?
              • Web service APIs expose lots of data




Jake Hofman    (Columbia University)   Data-driven modeling   January 23, 2012   5 / 19
Got data?


              • Many free, public data sets available online




Jake Hofman    (Columbia University)   Data-driven modeling    January 23, 2012   6 / 19
Black-boxified?




Jake Hofman   (Columbia University)   Data-driven modeling   January 23, 2012   7 / 19
Black-boxified?




Jake Hofman   (Columbia University)   Data-driven modeling   January 23, 2012   7 / 19
Roadmap?




                                      Step 1: Have data
                                      Step 2: ???
                                      Step 3: Profit




Jake Hofman   (Columbia University)      Data-driven modeling   January 23, 2012   8 / 19
Roadmap, take two


              1    Get data




Jake Hofman       (Columbia University)   Data-driven modeling   January 23, 2012   9 / 19
Roadmap, take two


              1    Get data
              2    Visualize/perform sanity checks
              3    Clean/filter observations
              4    Choose features to represent data




Jake Hofman       (Columbia University)   Data-driven modeling   January 23, 2012   9 / 19
Roadmap, take two


              1    Get data
              2    Visualize/perform sanity checks
              3    Clean/filter observations
              4    Choose features to represent data
              5    Specify model
              6    Specify loss function




Jake Hofman       (Columbia University)    Data-driven modeling   January 23, 2012   9 / 19
Roadmap, take two


              1    Get data
              2    Visualize/perform sanity checks
              3    Clean/filter observations
              4    Choose features to represent data
              5    Specify model
              6    Specify loss function
              7    Develop algorithm to minimize loss




Jake Hofman       (Columbia University)    Data-driven modeling   January 23, 2012   9 / 19
Roadmap, take two


              1    Get data
              2    Visualize/perform sanity checks
              3    Clean/filter observations
              4    Choose features to represent data
              5    Specify model
              6    Specify loss function
              7    Develop algorithm to minimize loss
              8    Choose performance measure
              9    “Train” to minimize loss
          10       “Test” to evaluate generalization



Jake Hofman       (Columbia University)    Data-driven modeling   January 23, 2012   9 / 19
Topics


        • Supervised                                    • Unsupervised
            • k-nearest neighbors                           • K-means
            • Naive Bayes                                   • Mixture models
            • Linear regression                             • Principal components
            • Logistic regression                             analysis
            • Support vector machines                       • Topic models
            • Collaborative filtering
            • Matrix factorization


              • Data representation: feature space, selection, normalization
              • Model assessment: complexity control, cross-validation, ROC
                curve, Bayesian Occam’s razor
              • Large-scale learning


Jake Hofman    (Columbia University)   Data-driven modeling              January 23, 2012   10 / 19
Everything old is new again1


              • Many fields ...
                    • Statistics
                    • Pattern recognition
                    • Data mining
                    • Machine learning
              • ... similar goals
                    • Extract and recognize patterns in data
                    • Interpret or explain observations
                    • Test validity of hypotheses
                    • Efficiently search the space of hypotheses
                    • Design efficient algorithms enabling machines to learn from
                      data




              1
                  http://cbcl.mit.edu/publications/theses/thesis-rifkin.pdf
Jake Hofman       (Columbia University)   Data-driven modeling        January 23, 2012   11 / 19
Statistics vs. machine learning2


                                                  Glossary

                          Machine learning                    Statistics


                          network, graphs                     model


                          weights                             parameters


                          learning                            fitting


                          generalization                      test set performance


                          supervised learning                 regression/classification


                          unsupervised learning               density estimation, clustering


                          large grant = $1,000,000            large grant= $50,000


                          nice place to have a meeting:       nice place to have a meeting:
                          Snowbird, Utah, French Alps         Las Vegas in August




                                                          1
         2
            http:
        //anyall.org/blog/2008/12/statistics-vs-machine-learning-fight/
Jake Hofman (Columbia University) Data-driven modeling        January 23, 2012                 12 / 19
Philosophy

              • We would like models that:
                 • Provide predictive and explanatory power
                 • Are complex enough to describe observed phenomena
                 • Are simple enough to generalize to future observations




Jake Hofman    (Columbia University)   Data-driven modeling          January 23, 2012   13 / 19
Example: Netflix Prize




Jake Hofman   (Columbia University)   Data-driven modeling   January 23, 2012   14 / 19
Example: Netflix Prize




Jake Hofman   (Columbia University)   Data-driven modeling   January 23, 2012   15 / 19
Shipping = Feature




Jake Hofman   (Columbia University)   Data-driven modeling   January 23, 2012   16 / 19
References




Jake Hofman   (Columbia University)   Data-driven modeling   January 23, 2012   17 / 19
Disclaimer


       You may be bored if you already know how to ...
              • Acquire data from APIs
              • Clean/explore/visualize data
              • Classify and cluster various types of data (e.g., images, text)
              • Code in Python, R, SciPy/NumPy, etc.
              • Scale solutions to large data sets (e.g. Hadoop, SGD)
              • Script with unix tools on the command line, e.g.

                $ sed -e ’s/<[^>]*>//g’ < page.html > page.txt




Jake Hofman    (Columbia University)   Data-driven modeling         January 23, 2012   18 / 19
Themes




                                      Data jeopardy


         Regardless of scale, it’s difficult to find the right questions to ask
                                    of the data




Jake Hofman   (Columbia University)    Data-driven modeling      January 23, 2012   19 / 19
Themes




                                      Data hacking


        Cleaning and normalizing data is a substantial amount of the work
                        (and likely impacts results)




Jake Hofman   (Columbia University)    Data-driven modeling   January 23, 2012   19 / 19
Themes




                                      Data hacking


               The ability to iterate quickly, asking and answering many
                                 questions, is crucial




Jake Hofman   (Columbia University)    Data-driven modeling      January 23, 2012   19 / 19
Themes




                                      Data hacking


                  Hacks happen: sed/awk/grep are useful, and scale




Jake Hofman   (Columbia University)    Data-driven modeling     January 23, 2012   19 / 19
Themes




                                      “Data science”


              Simple methods (e.g., linear models) work surprisingly well,
                            especially with lots of data




Jake Hofman   (Columbia University)    Data-driven modeling       January 23, 2012   19 / 19
Themes




                                      “Data science”


              It’s easy to cover your tracks—things are often much more
                             complicated than they appear




Jake Hofman   (Columbia University)    Data-driven modeling    January 23, 2012   19 / 19
Predicting consumer activity with Web search
with Sharad Goel, S´bastien Lahaie, David Pennock, Duncan Watts
                   e
                                                  "Right Round"
                                                                                        c

                                10



                                20
                         Rank




                                30


                                              Billboard
                                40            Search

                                     Mar−09   Apr−09 May−09 Jun−09             Jul−09   Aug−09
                                                          Week




Jake Hofman   (Yahoo! Research)                Learning from Online Activity                 November 30, 2011   6 / 71
Search predictions
Motivation



                                                                             "Right Round"
                                                                                                            c

                                                           10



   Does collective search activity                         20




                                                    Rank
   provide useful predictive signal
   about real-world outcomes?                              30


                                                                         Billboard
                                                           40            Search

                                                                Mar−09   Apr−09 May−09 Jun−09      Jul−09   Aug−09
                                                                                     Week




Jake Hofman   (Yahoo! Research)   Learning from Online Activity                             November 30, 2011   7 / 71
Search predictions
Motivation




        Past work mainly focuses on predicting the present1 and ignores
              baseline models trained on publicly available data

                                    8                                                                Actual
                                    7                                                                Search
              Flu Level (Percent)




                                    6                                                                Autoregressive
                                    5
                                    4
                                    3
                                    2
                                    1

                                        2004        2005      2006            2007         2008   2009           2010
                                                                       Date




          1
              Varian, 2009
Jake Hofman                     (Yahoo! Research)          Learning from Online Activity             November 30, 2011   8 / 71
Search predictions
Motivation




                            We predict future sales for movies, video games, and music

                                   "Transformers 2"                                    "Tom Clancy's HAWX"                                          "Right Round"
                                                            a                                                         b                                                              c

                                                                                                                                  10
      Search Volume




                                                                 Search Volume


                                                                                                                                  20




                                                                                                                           Rank
                                                                                                                                  30


                                                                                                                                                Billboard
                                                                                                                                  40            Search

                      −30    −20     −10   0    10     20   30                   −30   −20   −10   0     10      20   30               Mar−09   Apr−09 May−09      Jun−09   Jul−09   Aug−09
                              Time to Release (Days)                                    Time to Release (Days)                                              Week




Jake Hofman                  (Yahoo! Research)                                   Learning from Online Activity                                           November 30, 2011                    9 / 71
Search predictions
Search models




      For movies and video games, predict opening weekend box office
      and first month sales, respectively:

                           log(revenue) = β0 + β1 log(search) + 

      For music, predict following week’s Billboard Hot 100 rank:

                   billboardt+1 = β0 + β1 searcht + β2 searcht−1 + 




Jake Hofman   (Yahoo! Research)      Learning from Online Activity   November 30, 2011   10 / 71
Search predictions
Search volume




Jake Hofman   (Yahoo! Research)   Learning from Online Activity   November 30, 2011   11 / 71
Data-driven modeling: Lecture 01
Data-driven modeling: Lecture 01
Data-driven modeling: Lecture 01
Data-driven modeling: Lecture 01
Data-driven modeling: Lecture 01
Data-driven modeling: Lecture 01
Data-driven modeling: Lecture 01
Data-driven modeling: Lecture 01
Data-driven modeling: Lecture 01
Data-driven modeling: Lecture 01
Data-driven modeling: Lecture 01
Data-driven modeling: Lecture 01
Data-driven modeling: Lecture 01
Data-driven modeling: Lecture 01
Data-driven modeling: Lecture 01
Data-driven modeling: Lecture 01
Data-driven modeling: Lecture 01
Data-driven modeling: Lecture 01
Data-driven modeling: Lecture 01
Data-driven modeling: Lecture 01
Data-driven modeling: Lecture 01
Data-driven modeling: Lecture 01
Data-driven modeling: Lecture 01
Data-driven modeling: Lecture 01
Data-driven modeling: Lecture 01
Data-driven modeling: Lecture 01
Data-driven modeling: Lecture 01
Data-driven modeling: Lecture 01
Data-driven modeling: Lecture 01
Data-driven modeling: Lecture 01
Data-driven modeling: Lecture 01
Data-driven modeling: Lecture 01
Data-driven modeling: Lecture 01
Data-driven modeling: Lecture 01
Data-driven modeling: Lecture 01
Data-driven modeling: Lecture 01
Data-driven modeling: Lecture 01
Data-driven modeling: Lecture 01
Data-driven modeling: Lecture 01
Data-driven modeling: Lecture 01
Data-driven modeling: Lecture 01
Data-driven modeling: Lecture 01

Weitere ähnliche Inhalte

Was ist angesagt?

The Data Warehouse Lifecycle
The Data Warehouse LifecycleThe Data Warehouse Lifecycle
The Data Warehouse Lifecyclebartlowe
 
L2. Evaluating Machine Learning Algorithms I
L2. Evaluating Machine Learning Algorithms IL2. Evaluating Machine Learning Algorithms I
L2. Evaluating Machine Learning Algorithms IMachine Learning Valencia
 
Heuristc Search Techniques
Heuristc Search TechniquesHeuristc Search Techniques
Heuristc Search TechniquesJismy .K.Jose
 
Lecture 18: Gaussian Mixture Models and Expectation Maximization
Lecture 18: Gaussian Mixture Models and Expectation MaximizationLecture 18: Gaussian Mixture Models and Expectation Maximization
Lecture 18: Gaussian Mixture Models and Expectation Maximizationbutest
 
Dimensionality Reduction
Dimensionality ReductionDimensionality Reduction
Dimensionality Reductionmrizwan969
 
Data analysis
Data analysisData analysis
Data analysisamlbinder
 
Decision Tree.pptx
Decision Tree.pptxDecision Tree.pptx
Decision Tree.pptxkibriaswe
 
Data mining Measuring similarity and desimilarity
Data mining Measuring similarity and desimilarityData mining Measuring similarity and desimilarity
Data mining Measuring similarity and desimilarityRushali Deshmukh
 
Dimension Reduction: What? Why? and How?
Dimension Reduction: What? Why? and How?Dimension Reduction: What? Why? and How?
Dimension Reduction: What? Why? and How?Kazi Toufiq Wadud
 
Ensemble methods in machine learning
Ensemble methods in machine learningEnsemble methods in machine learning
Ensemble methods in machine learningSANTHOSH RAJA M G
 
Dimensionality Reduction | Machine Learning | CloudxLab
Dimensionality Reduction | Machine Learning | CloudxLabDimensionality Reduction | Machine Learning | CloudxLab
Dimensionality Reduction | Machine Learning | CloudxLabCloudxLab
 
Exploratory data analysis
Exploratory data analysis Exploratory data analysis
Exploratory data analysis Peter Reimann
 
Feature Engineering
Feature EngineeringFeature Engineering
Feature EngineeringHJ van Veen
 
Machine Learning - Dataset Preparation
Machine Learning - Dataset PreparationMachine Learning - Dataset Preparation
Machine Learning - Dataset PreparationAndrew Ferlitsch
 
Predictive analysis and modelling
Predictive analysis and modellingPredictive analysis and modelling
Predictive analysis and modellinglalit Lalitm7225
 
CRISP-DM - Agile Approach To Data Mining Projects
CRISP-DM - Agile Approach To Data Mining ProjectsCRISP-DM - Agile Approach To Data Mining Projects
CRISP-DM - Agile Approach To Data Mining ProjectsMichał Łopuszyński
 
Unit 3 part ii Data mining
Unit 3 part ii Data miningUnit 3 part ii Data mining
Unit 3 part ii Data miningDhilsath Fathima
 

Was ist angesagt? (20)

The Data Warehouse Lifecycle
The Data Warehouse LifecycleThe Data Warehouse Lifecycle
The Data Warehouse Lifecycle
 
L2. Evaluating Machine Learning Algorithms I
L2. Evaluating Machine Learning Algorithms IL2. Evaluating Machine Learning Algorithms I
L2. Evaluating Machine Learning Algorithms I
 
Heuristc Search Techniques
Heuristc Search TechniquesHeuristc Search Techniques
Heuristc Search Techniques
 
Lecture 18: Gaussian Mixture Models and Expectation Maximization
Lecture 18: Gaussian Mixture Models and Expectation MaximizationLecture 18: Gaussian Mixture Models and Expectation Maximization
Lecture 18: Gaussian Mixture Models and Expectation Maximization
 
Dimensionality Reduction
Dimensionality ReductionDimensionality Reduction
Dimensionality Reduction
 
Data analysis
Data analysisData analysis
Data analysis
 
Decision Tree.pptx
Decision Tree.pptxDecision Tree.pptx
Decision Tree.pptx
 
Data mining Measuring similarity and desimilarity
Data mining Measuring similarity and desimilarityData mining Measuring similarity and desimilarity
Data mining Measuring similarity and desimilarity
 
Dimension Reduction: What? Why? and How?
Dimension Reduction: What? Why? and How?Dimension Reduction: What? Why? and How?
Dimension Reduction: What? Why? and How?
 
Ensemble methods in machine learning
Ensemble methods in machine learningEnsemble methods in machine learning
Ensemble methods in machine learning
 
Dimensionality Reduction | Machine Learning | CloudxLab
Dimensionality Reduction | Machine Learning | CloudxLabDimensionality Reduction | Machine Learning | CloudxLab
Dimensionality Reduction | Machine Learning | CloudxLab
 
Exploratory data analysis
Exploratory data analysis Exploratory data analysis
Exploratory data analysis
 
Feature Engineering
Feature EngineeringFeature Engineering
Feature Engineering
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
Applications of hybrid systems
Applications of hybrid systemsApplications of hybrid systems
Applications of hybrid systems
 
Machine Learning - Dataset Preparation
Machine Learning - Dataset PreparationMachine Learning - Dataset Preparation
Machine Learning - Dataset Preparation
 
Predictive analysis and modelling
Predictive analysis and modellingPredictive analysis and modelling
Predictive analysis and modelling
 
CRISP-DM - Agile Approach To Data Mining Projects
CRISP-DM - Agile Approach To Data Mining ProjectsCRISP-DM - Agile Approach To Data Mining Projects
CRISP-DM - Agile Approach To Data Mining Projects
 
Unit 3 part ii Data mining
Unit 3 part ii Data miningUnit 3 part ii Data mining
Unit 3 part ii Data mining
 
Data Preprocessing
Data PreprocessingData Preprocessing
Data Preprocessing
 

Andere mochten auch

Paris e suas igrejas
Paris e suas igrejasParis e suas igrejas
Paris e suas igrejasfilipj2000
 
Summarizing ppoint
Summarizing ppointSummarizing ppoint
Summarizing ppointSusan Isbell
 
Lb oso polar-polar bear
Lb oso polar-polar bearLb oso polar-polar bear
Lb oso polar-polar bearfilipj2000
 
Jim sterne terametric_twitter-webinar_120910
Jim sterne terametric_twitter-webinar_120910Jim sterne terametric_twitter-webinar_120910
Jim sterne terametric_twitter-webinar_120910Terametric
 
Defrag: Applying Twitter Analytics in Real Time
Defrag: Applying Twitter Analytics in Real TimeDefrag: Applying Twitter Analytics in Real Time
Defrag: Applying Twitter Analytics in Real TimeTerametric
 
Hieu qua san xuat bap lai tren dat lua Dong bang song Cuu Long-TS. Ho Cao Viet
Hieu qua san  xuat bap lai tren dat lua Dong bang song Cuu Long-TS. Ho Cao VietHieu qua san  xuat bap lai tren dat lua Dong bang song Cuu Long-TS. Ho Cao Viet
Hieu qua san xuat bap lai tren dat lua Dong bang song Cuu Long-TS. Ho Cao VietHo Cao Viet
 
Visualizing geolocated Internet measurements
Visualizing geolocated Internet measurementsVisualizing geolocated Internet measurements
Visualizing geolocated Internet measurementsClaudio Squarcella
 
7강 기업교육론 20110413
7강 기업교육론 201104137강 기업교육론 20110413
7강 기업교육론 20110413조현경
 
Stroke symposiuma tpaper91411
Stroke symposiuma tpaper91411Stroke symposiuma tpaper91411
Stroke symposiuma tpaper91411Pat Maher
 
10강 기업교육론 20110504
10강 기업교육론 2011050410강 기업교육론 20110504
10강 기업교육론 20110504조현경
 
기업교육론 6장 학생발표자료
기업교육론 6장 학생발표자료기업교육론 6장 학생발표자료
기업교육론 6장 학생발표자료조현경
 
Learning from Web Activity
Learning from Web ActivityLearning from Web Activity
Learning from Web Activityjakehofman
 
Maherprofessional c vbio216
Maherprofessional c vbio216Maherprofessional c vbio216
Maherprofessional c vbio216Pat Maher
 

Andere mochten auch (20)

Creativity
CreativityCreativity
Creativity
 
移动产品界面适配设计
移动产品界面适配设计移动产品界面适配设计
移动产品界面适配设计
 
Niver helen
Niver helenNiver helen
Niver helen
 
Maine prevention savings 11 22-10
Maine prevention savings 11 22-10Maine prevention savings 11 22-10
Maine prevention savings 11 22-10
 
Paris e suas igrejas
Paris e suas igrejasParis e suas igrejas
Paris e suas igrejas
 
Tecnicas
TecnicasTecnicas
Tecnicas
 
Baile
Baile Baile
Baile
 
Summarizing ppoint
Summarizing ppointSummarizing ppoint
Summarizing ppoint
 
Lb oso polar-polar bear
Lb oso polar-polar bearLb oso polar-polar bear
Lb oso polar-polar bear
 
Jim sterne terametric_twitter-webinar_120910
Jim sterne terametric_twitter-webinar_120910Jim sterne terametric_twitter-webinar_120910
Jim sterne terametric_twitter-webinar_120910
 
Defrag: Applying Twitter Analytics in Real Time
Defrag: Applying Twitter Analytics in Real TimeDefrag: Applying Twitter Analytics in Real Time
Defrag: Applying Twitter Analytics in Real Time
 
Hieu qua san xuat bap lai tren dat lua Dong bang song Cuu Long-TS. Ho Cao Viet
Hieu qua san  xuat bap lai tren dat lua Dong bang song Cuu Long-TS. Ho Cao VietHieu qua san  xuat bap lai tren dat lua Dong bang song Cuu Long-TS. Ho Cao Viet
Hieu qua san xuat bap lai tren dat lua Dong bang song Cuu Long-TS. Ho Cao Viet
 
Visualizing geolocated Internet measurements
Visualizing geolocated Internet measurementsVisualizing geolocated Internet measurements
Visualizing geolocated Internet measurements
 
7강 기업교육론 20110413
7강 기업교육론 201104137강 기업교육론 20110413
7강 기업교육론 20110413
 
Stroke symposiuma tpaper91411
Stroke symposiuma tpaper91411Stroke symposiuma tpaper91411
Stroke symposiuma tpaper91411
 
10강 기업교육론 20110504
10강 기업교육론 2011050410강 기업교육론 20110504
10강 기업교육론 20110504
 
Spectra-Profile
Spectra-ProfileSpectra-Profile
Spectra-Profile
 
기업교육론 6장 학생발표자료
기업교육론 6장 학생발표자료기업교육론 6장 학생발표자료
기업교육론 6장 학생발표자료
 
Learning from Web Activity
Learning from Web ActivityLearning from Web Activity
Learning from Web Activity
 
Maherprofessional c vbio216
Maherprofessional c vbio216Maherprofessional c vbio216
Maherprofessional c vbio216
 

Ähnlich wie Data-driven modeling: Lecture 01

Data-driven modeling: Lecture 02
Data-driven modeling: Lecture 02Data-driven modeling: Lecture 02
Data-driven modeling: Lecture 02jakehofman
 
Data-driven modeling: Lecture 09
Data-driven modeling: Lecture 09Data-driven modeling: Lecture 09
Data-driven modeling: Lecture 09jakehofman
 
Computational Social Science, Lecture 02: An Introduction to Counting
Computational Social Science, Lecture 02: An Introduction to CountingComputational Social Science, Lecture 02: An Introduction to Counting
Computational Social Science, Lecture 02: An Introduction to Countingjakehofman
 
Data-driven modeling: Lecture 10
Data-driven modeling: Lecture 10Data-driven modeling: Lecture 10
Data-driven modeling: Lecture 10jakehofman
 
Machine Learning: Learning with data
Machine Learning: Learning with dataMachine Learning: Learning with data
Machine Learning: Learning with dataONE Talks
 
One talk Machine Learning
One talk Machine LearningOne talk Machine Learning
One talk Machine LearningONE Talks
 
DutchMLSchool 2022 - History and Developments in ML
DutchMLSchool 2022 - History and Developments in MLDutchMLSchool 2022 - History and Developments in ML
DutchMLSchool 2022 - History and Developments in MLBigML, Inc
 
A data science observatory based on RAMP - rapid analytics and model prototyping
A data science observatory based on RAMP - rapid analytics and model prototypingA data science observatory based on RAMP - rapid analytics and model prototyping
A data science observatory based on RAMP - rapid analytics and model prototypingAkin Osman Kazakci
 
Large Scale Data Mining using Genetics-Based Machine Learning
Large Scale Data Mining using Genetics-Based Machine LearningLarge Scale Data Mining using Genetics-Based Machine Learning
Large Scale Data Mining using Genetics-Based Machine Learningjaumebp
 
MACHINE LEARNING LIFE CYCLE
MACHINE LEARNING LIFE CYCLEMACHINE LEARNING LIFE CYCLE
MACHINE LEARNING LIFE CYCLEBhimsen Joshi
 
Dm sei-tutorial-v7
Dm sei-tutorial-v7Dm sei-tutorial-v7
Dm sei-tutorial-v7CS, NcState
 
Altron presentation on Emerging Technologies: Data Science and Artificial Int...
Altron presentation on Emerging Technologies: Data Science and Artificial Int...Altron presentation on Emerging Technologies: Data Science and Artificial Int...
Altron presentation on Emerging Technologies: Data Science and Artificial Int...Robert Williams
 
Machine-Learned Ranking using Distributed Parallel Genetic Programming
Machine-Learned Ranking using Distributed Parallel Genetic ProgrammingMachine-Learned Ranking using Distributed Parallel Genetic Programming
Machine-Learned Ranking using Distributed Parallel Genetic Programmingrusho1234
 
Connections b/w active learning and model extraction
Connections b/w active learning and model extractionConnections b/w active learning and model extraction
Connections b/w active learning and model extractionAnmol Dwivedi
 
Algorithmic Fairness: A Brief Introduction
Algorithmic Fairness: A Brief IntroductionAlgorithmic Fairness: A Brief Introduction
Algorithmic Fairness: A Brief IntroductionAnthonyMelson
 

Ähnlich wie Data-driven modeling: Lecture 01 (20)

Data-driven modeling: Lecture 02
Data-driven modeling: Lecture 02Data-driven modeling: Lecture 02
Data-driven modeling: Lecture 02
 
Data-driven modeling: Lecture 09
Data-driven modeling: Lecture 09Data-driven modeling: Lecture 09
Data-driven modeling: Lecture 09
 
Computational Social Science, Lecture 02: An Introduction to Counting
Computational Social Science, Lecture 02: An Introduction to CountingComputational Social Science, Lecture 02: An Introduction to Counting
Computational Social Science, Lecture 02: An Introduction to Counting
 
Data-driven modeling: Lecture 10
Data-driven modeling: Lecture 10Data-driven modeling: Lecture 10
Data-driven modeling: Lecture 10
 
Machine Learning: Learning with data
Machine Learning: Learning with dataMachine Learning: Learning with data
Machine Learning: Learning with data
 
One talk Machine Learning
One talk Machine LearningOne talk Machine Learning
One talk Machine Learning
 
DutchMLSchool 2022 - History and Developments in ML
DutchMLSchool 2022 - History and Developments in MLDutchMLSchool 2022 - History and Developments in ML
DutchMLSchool 2022 - History and Developments in ML
 
A data science observatory based on RAMP - rapid analytics and model prototyping
A data science observatory based on RAMP - rapid analytics and model prototypingA data science observatory based on RAMP - rapid analytics and model prototyping
A data science observatory based on RAMP - rapid analytics and model prototyping
 
10 best practices in operational analytics
10 best practices in operational analytics 10 best practices in operational analytics
10 best practices in operational analytics
 
Large Scale Data Mining using Genetics-Based Machine Learning
Large Scale Data Mining using Genetics-Based Machine LearningLarge Scale Data Mining using Genetics-Based Machine Learning
Large Scale Data Mining using Genetics-Based Machine Learning
 
Lecture-2 Applied ML .pptx
Lecture-2 Applied ML .pptxLecture-2 Applied ML .pptx
Lecture-2 Applied ML .pptx
 
MACHINE LEARNING LIFE CYCLE
MACHINE LEARNING LIFE CYCLEMACHINE LEARNING LIFE CYCLE
MACHINE LEARNING LIFE CYCLE
 
Dm sei-tutorial-v7
Dm sei-tutorial-v7Dm sei-tutorial-v7
Dm sei-tutorial-v7
 
Altron presentation on Emerging Technologies: Data Science and Artificial Int...
Altron presentation on Emerging Technologies: Data Science and Artificial Int...Altron presentation on Emerging Technologies: Data Science and Artificial Int...
Altron presentation on Emerging Technologies: Data Science and Artificial Int...
 
Machine-Learned Ranking using Distributed Parallel Genetic Programming
Machine-Learned Ranking using Distributed Parallel Genetic ProgrammingMachine-Learned Ranking using Distributed Parallel Genetic Programming
Machine-Learned Ranking using Distributed Parallel Genetic Programming
 
Connections b/w active learning and model extraction
Connections b/w active learning and model extractionConnections b/w active learning and model extraction
Connections b/w active learning and model extraction
 
Algorithmic Fairness: A Brief Introduction
Algorithmic Fairness: A Brief IntroductionAlgorithmic Fairness: A Brief Introduction
Algorithmic Fairness: A Brief Introduction
 
ML.ppt
ML.pptML.ppt
ML.ppt
 
ML.ppt
ML.pptML.ppt
ML.ppt
 
ML.ppt
ML.pptML.ppt
ML.ppt
 

Mehr von jakehofman

Modeling Social Data, Lecture 12: Causality & Experiments, Part 2
Modeling Social Data, Lecture 12: Causality & Experiments, Part 2Modeling Social Data, Lecture 12: Causality & Experiments, Part 2
Modeling Social Data, Lecture 12: Causality & Experiments, Part 2jakehofman
 
Modeling Social Data, Lecture 11: Causality and Experiments, Part 1
Modeling Social Data, Lecture 11: Causality and Experiments, Part 1Modeling Social Data, Lecture 11: Causality and Experiments, Part 1
Modeling Social Data, Lecture 11: Causality and Experiments, Part 1jakehofman
 
Modeling Social Data, Lecture 10: Networks
Modeling Social Data, Lecture 10: NetworksModeling Social Data, Lecture 10: Networks
Modeling Social Data, Lecture 10: Networksjakehofman
 
Modeling Social Data, Lecture 8: Classification
Modeling Social Data, Lecture 8: ClassificationModeling Social Data, Lecture 8: Classification
Modeling Social Data, Lecture 8: Classificationjakehofman
 
Modeling Social Data, Lecture 7: Model complexity and generalization
Modeling Social Data, Lecture 7: Model complexity and generalizationModeling Social Data, Lecture 7: Model complexity and generalization
Modeling Social Data, Lecture 7: Model complexity and generalizationjakehofman
 
Modeling Social Data, Lecture 6: Regression, Part 1
Modeling Social Data, Lecture 6: Regression, Part 1Modeling Social Data, Lecture 6: Regression, Part 1
Modeling Social Data, Lecture 6: Regression, Part 1jakehofman
 
Modeling Social Data, Lecture 4: Counting at Scale
Modeling Social Data, Lecture 4: Counting at ScaleModeling Social Data, Lecture 4: Counting at Scale
Modeling Social Data, Lecture 4: Counting at Scalejakehofman
 
Modeling Social Data, Lecture 3: Data manipulation in R
Modeling Social Data, Lecture 3: Data manipulation in RModeling Social Data, Lecture 3: Data manipulation in R
Modeling Social Data, Lecture 3: Data manipulation in Rjakehofman
 
Modeling Social Data, Lecture 2: Introduction to Counting
Modeling Social Data, Lecture 2: Introduction to CountingModeling Social Data, Lecture 2: Introduction to Counting
Modeling Social Data, Lecture 2: Introduction to Countingjakehofman
 
Modeling Social Data, Lecture 1: Overview
Modeling Social Data, Lecture 1: OverviewModeling Social Data, Lecture 1: Overview
Modeling Social Data, Lecture 1: Overviewjakehofman
 
Modeling Social Data, Lecture 8: Recommendation Systems
Modeling Social Data, Lecture 8: Recommendation SystemsModeling Social Data, Lecture 8: Recommendation Systems
Modeling Social Data, Lecture 8: Recommendation Systemsjakehofman
 
Modeling Social Data, Lecture 6: Classification with Naive Bayes
Modeling Social Data, Lecture 6: Classification with Naive BayesModeling Social Data, Lecture 6: Classification with Naive Bayes
Modeling Social Data, Lecture 6: Classification with Naive Bayesjakehofman
 
Modeling Social Data, Lecture 3: Counting at Scale
Modeling Social Data, Lecture 3: Counting at ScaleModeling Social Data, Lecture 3: Counting at Scale
Modeling Social Data, Lecture 3: Counting at Scalejakehofman
 
Modeling Social Data, Lecture 2: Introduction to Counting
Modeling Social Data, Lecture 2: Introduction to CountingModeling Social Data, Lecture 2: Introduction to Counting
Modeling Social Data, Lecture 2: Introduction to Countingjakehofman
 
Modeling Social Data, Lecture 1: Case Studies
Modeling Social Data, Lecture 1: Case StudiesModeling Social Data, Lecture 1: Case Studies
Modeling Social Data, Lecture 1: Case Studiesjakehofman
 
NYC Data Science Meetup: Computational Social Science
NYC Data Science Meetup: Computational Social ScienceNYC Data Science Meetup: Computational Social Science
NYC Data Science Meetup: Computational Social Sciencejakehofman
 
Computational Social Science, Lecture 13: Classification
Computational Social Science, Lecture 13: ClassificationComputational Social Science, Lecture 13: Classification
Computational Social Science, Lecture 13: Classificationjakehofman
 
Computational Social Science, Lecture 11: Regression
Computational Social Science, Lecture 11: RegressionComputational Social Science, Lecture 11: Regression
Computational Social Science, Lecture 11: Regressionjakehofman
 
Computational Social Science, Lecture 10: Online Experiments
Computational Social Science, Lecture 10: Online ExperimentsComputational Social Science, Lecture 10: Online Experiments
Computational Social Science, Lecture 10: Online Experimentsjakehofman
 
Computational Social Science, Lecture 09: Data Wrangling
Computational Social Science, Lecture 09: Data WranglingComputational Social Science, Lecture 09: Data Wrangling
Computational Social Science, Lecture 09: Data Wranglingjakehofman
 

Mehr von jakehofman (20)

Modeling Social Data, Lecture 12: Causality & Experiments, Part 2
Modeling Social Data, Lecture 12: Causality & Experiments, Part 2Modeling Social Data, Lecture 12: Causality & Experiments, Part 2
Modeling Social Data, Lecture 12: Causality & Experiments, Part 2
 
Modeling Social Data, Lecture 11: Causality and Experiments, Part 1
Modeling Social Data, Lecture 11: Causality and Experiments, Part 1Modeling Social Data, Lecture 11: Causality and Experiments, Part 1
Modeling Social Data, Lecture 11: Causality and Experiments, Part 1
 
Modeling Social Data, Lecture 10: Networks
Modeling Social Data, Lecture 10: NetworksModeling Social Data, Lecture 10: Networks
Modeling Social Data, Lecture 10: Networks
 
Modeling Social Data, Lecture 8: Classification
Modeling Social Data, Lecture 8: ClassificationModeling Social Data, Lecture 8: Classification
Modeling Social Data, Lecture 8: Classification
 
Modeling Social Data, Lecture 7: Model complexity and generalization
Modeling Social Data, Lecture 7: Model complexity and generalizationModeling Social Data, Lecture 7: Model complexity and generalization
Modeling Social Data, Lecture 7: Model complexity and generalization
 
Modeling Social Data, Lecture 6: Regression, Part 1
Modeling Social Data, Lecture 6: Regression, Part 1Modeling Social Data, Lecture 6: Regression, Part 1
Modeling Social Data, Lecture 6: Regression, Part 1
 
Modeling Social Data, Lecture 4: Counting at Scale
Modeling Social Data, Lecture 4: Counting at ScaleModeling Social Data, Lecture 4: Counting at Scale
Modeling Social Data, Lecture 4: Counting at Scale
 
Modeling Social Data, Lecture 3: Data manipulation in R
Modeling Social Data, Lecture 3: Data manipulation in RModeling Social Data, Lecture 3: Data manipulation in R
Modeling Social Data, Lecture 3: Data manipulation in R
 
Modeling Social Data, Lecture 2: Introduction to Counting
Modeling Social Data, Lecture 2: Introduction to CountingModeling Social Data, Lecture 2: Introduction to Counting
Modeling Social Data, Lecture 2: Introduction to Counting
 
Modeling Social Data, Lecture 1: Overview
Modeling Social Data, Lecture 1: OverviewModeling Social Data, Lecture 1: Overview
Modeling Social Data, Lecture 1: Overview
 
Modeling Social Data, Lecture 8: Recommendation Systems
Modeling Social Data, Lecture 8: Recommendation SystemsModeling Social Data, Lecture 8: Recommendation Systems
Modeling Social Data, Lecture 8: Recommendation Systems
 
Modeling Social Data, Lecture 6: Classification with Naive Bayes
Modeling Social Data, Lecture 6: Classification with Naive BayesModeling Social Data, Lecture 6: Classification with Naive Bayes
Modeling Social Data, Lecture 6: Classification with Naive Bayes
 
Modeling Social Data, Lecture 3: Counting at Scale
Modeling Social Data, Lecture 3: Counting at ScaleModeling Social Data, Lecture 3: Counting at Scale
Modeling Social Data, Lecture 3: Counting at Scale
 
Modeling Social Data, Lecture 2: Introduction to Counting
Modeling Social Data, Lecture 2: Introduction to CountingModeling Social Data, Lecture 2: Introduction to Counting
Modeling Social Data, Lecture 2: Introduction to Counting
 
Modeling Social Data, Lecture 1: Case Studies
Modeling Social Data, Lecture 1: Case StudiesModeling Social Data, Lecture 1: Case Studies
Modeling Social Data, Lecture 1: Case Studies
 
NYC Data Science Meetup: Computational Social Science
NYC Data Science Meetup: Computational Social ScienceNYC Data Science Meetup: Computational Social Science
NYC Data Science Meetup: Computational Social Science
 
Computational Social Science, Lecture 13: Classification
Computational Social Science, Lecture 13: ClassificationComputational Social Science, Lecture 13: Classification
Computational Social Science, Lecture 13: Classification
 
Computational Social Science, Lecture 11: Regression
Computational Social Science, Lecture 11: RegressionComputational Social Science, Lecture 11: Regression
Computational Social Science, Lecture 11: Regression
 
Computational Social Science, Lecture 10: Online Experiments
Computational Social Science, Lecture 10: Online ExperimentsComputational Social Science, Lecture 10: Online Experiments
Computational Social Science, Lecture 10: Online Experiments
 
Computational Social Science, Lecture 09: Data Wrangling
Computational Social Science, Lecture 09: Data WranglingComputational Social Science, Lecture 09: Data Wrangling
Computational Social Science, Lecture 09: Data Wrangling
 

Kürzlich hochgeladen

Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxRamakrishna Reddy Bijjam
 
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...Nguyen Thanh Tu Collection
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxVishalSingh1417
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - Englishneillewis46
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfSherif Taha
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxheathfieldcps1
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17Celine George
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...pradhanghanshyam7136
 
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxHMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxmarlenawright1
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.christianmathematics
 
Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jisc
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxAmanpreet Kaur
 
Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Association for Project Management
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxEsquimalt MFRC
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsMebane Rash
 
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Pooja Bhuva
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsKarakKing
 

Kürzlich hochgeladen (20)

Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - English
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
 
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxHMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
 
Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 

Data-driven modeling: Lecture 01

  • 1. Data-driven modeling APAM E4990 Jake Hofman Columbia University January 23, 2012 Jake Hofman (Columbia University) Data-driven modeling January 23, 2012 1 / 19
  • 2. Data-dependent products • Effective/practical systems that learn from experience impact our daily lives, e.g.: • Recommendation systems • Spam detection • Optical character recognition • Face recognition • Fraud detection • Machine translation • ... Jake Hofman (Columbia University) Data-driven modeling January 23, 2012 2 / 19
  • 3. Learning by example Jake Hofman (Columbia University) Data-driven modeling January 23, 2012 3 / 19
  • 4. Learning by example • How did you solve this problem? • Can you make this process explicit (e.g. write code to do so)? Jake Hofman (Columbia University) Data-driven modeling January 23, 2012 3 / 19
  • 5. Learning by example • We learn quickly from few, relatively unstructured examples ... but we don’t understand how we accomplish this • We’d like to develop algorithms that enable machines to learn by example from large data sets Jake Hofman (Columbia University) Data-driven modeling January 23, 2012 4 / 19
  • 6. Got data? • Web service APIs expose lots of data Jake Hofman (Columbia University) Data-driven modeling January 23, 2012 5 / 19
  • 7. Got data? • Many free, public data sets available online Jake Hofman (Columbia University) Data-driven modeling January 23, 2012 6 / 19
  • 8. Black-boxified? Jake Hofman (Columbia University) Data-driven modeling January 23, 2012 7 / 19
  • 9. Black-boxified? Jake Hofman (Columbia University) Data-driven modeling January 23, 2012 7 / 19
  • 10. Roadmap? Step 1: Have data Step 2: ??? Step 3: Profit Jake Hofman (Columbia University) Data-driven modeling January 23, 2012 8 / 19
  • 11. Roadmap, take two 1 Get data Jake Hofman (Columbia University) Data-driven modeling January 23, 2012 9 / 19
  • 12. Roadmap, take two 1 Get data 2 Visualize/perform sanity checks 3 Clean/filter observations 4 Choose features to represent data Jake Hofman (Columbia University) Data-driven modeling January 23, 2012 9 / 19
  • 13. Roadmap, take two 1 Get data 2 Visualize/perform sanity checks 3 Clean/filter observations 4 Choose features to represent data 5 Specify model 6 Specify loss function Jake Hofman (Columbia University) Data-driven modeling January 23, 2012 9 / 19
  • 14. Roadmap, take two 1 Get data 2 Visualize/perform sanity checks 3 Clean/filter observations 4 Choose features to represent data 5 Specify model 6 Specify loss function 7 Develop algorithm to minimize loss Jake Hofman (Columbia University) Data-driven modeling January 23, 2012 9 / 19
  • 15. Roadmap, take two 1 Get data 2 Visualize/perform sanity checks 3 Clean/filter observations 4 Choose features to represent data 5 Specify model 6 Specify loss function 7 Develop algorithm to minimize loss 8 Choose performance measure 9 “Train” to minimize loss 10 “Test” to evaluate generalization Jake Hofman (Columbia University) Data-driven modeling January 23, 2012 9 / 19
  • 16. Topics • Supervised • Unsupervised • k-nearest neighbors • K-means • Naive Bayes • Mixture models • Linear regression • Principal components • Logistic regression analysis • Support vector machines • Topic models • Collaborative filtering • Matrix factorization • Data representation: feature space, selection, normalization • Model assessment: complexity control, cross-validation, ROC curve, Bayesian Occam’s razor • Large-scale learning Jake Hofman (Columbia University) Data-driven modeling January 23, 2012 10 / 19
  • 17. Everything old is new again1 • Many fields ... • Statistics • Pattern recognition • Data mining • Machine learning • ... similar goals • Extract and recognize patterns in data • Interpret or explain observations • Test validity of hypotheses • Efficiently search the space of hypotheses • Design efficient algorithms enabling machines to learn from data 1 http://cbcl.mit.edu/publications/theses/thesis-rifkin.pdf Jake Hofman (Columbia University) Data-driven modeling January 23, 2012 11 / 19
  • 18. Statistics vs. machine learning2 Glossary Machine learning Statistics network, graphs model weights parameters learning fitting generalization test set performance supervised learning regression/classification unsupervised learning density estimation, clustering large grant = $1,000,000 large grant= $50,000 nice place to have a meeting: nice place to have a meeting: Snowbird, Utah, French Alps Las Vegas in August 1 2 http: //anyall.org/blog/2008/12/statistics-vs-machine-learning-fight/ Jake Hofman (Columbia University) Data-driven modeling January 23, 2012 12 / 19
  • 19. Philosophy • We would like models that: • Provide predictive and explanatory power • Are complex enough to describe observed phenomena • Are simple enough to generalize to future observations Jake Hofman (Columbia University) Data-driven modeling January 23, 2012 13 / 19
  • 20. Example: Netflix Prize Jake Hofman (Columbia University) Data-driven modeling January 23, 2012 14 / 19
  • 21. Example: Netflix Prize Jake Hofman (Columbia University) Data-driven modeling January 23, 2012 15 / 19
  • 22. Shipping = Feature Jake Hofman (Columbia University) Data-driven modeling January 23, 2012 16 / 19
  • 23. References Jake Hofman (Columbia University) Data-driven modeling January 23, 2012 17 / 19
  • 24. Disclaimer You may be bored if you already know how to ... • Acquire data from APIs • Clean/explore/visualize data • Classify and cluster various types of data (e.g., images, text) • Code in Python, R, SciPy/NumPy, etc. • Scale solutions to large data sets (e.g. Hadoop, SGD) • Script with unix tools on the command line, e.g. $ sed -e ’s/<[^>]*>//g’ < page.html > page.txt Jake Hofman (Columbia University) Data-driven modeling January 23, 2012 18 / 19
  • 25. Themes Data jeopardy Regardless of scale, it’s difficult to find the right questions to ask of the data Jake Hofman (Columbia University) Data-driven modeling January 23, 2012 19 / 19
  • 26. Themes Data hacking Cleaning and normalizing data is a substantial amount of the work (and likely impacts results) Jake Hofman (Columbia University) Data-driven modeling January 23, 2012 19 / 19
  • 27. Themes Data hacking The ability to iterate quickly, asking and answering many questions, is crucial Jake Hofman (Columbia University) Data-driven modeling January 23, 2012 19 / 19
  • 28. Themes Data hacking Hacks happen: sed/awk/grep are useful, and scale Jake Hofman (Columbia University) Data-driven modeling January 23, 2012 19 / 19
  • 29. Themes “Data science” Simple methods (e.g., linear models) work surprisingly well, especially with lots of data Jake Hofman (Columbia University) Data-driven modeling January 23, 2012 19 / 19
  • 30. Themes “Data science” It’s easy to cover your tracks—things are often much more complicated than they appear Jake Hofman (Columbia University) Data-driven modeling January 23, 2012 19 / 19
  • 31.
  • 32. Predicting consumer activity with Web search with Sharad Goel, S´bastien Lahaie, David Pennock, Duncan Watts e "Right Round" c 10 20 Rank 30 Billboard 40 Search Mar−09 Apr−09 May−09 Jun−09 Jul−09 Aug−09 Week Jake Hofman (Yahoo! Research) Learning from Online Activity November 30, 2011 6 / 71
  • 33. Search predictions Motivation "Right Round" c 10 Does collective search activity 20 Rank provide useful predictive signal about real-world outcomes? 30 Billboard 40 Search Mar−09 Apr−09 May−09 Jun−09 Jul−09 Aug−09 Week Jake Hofman (Yahoo! Research) Learning from Online Activity November 30, 2011 7 / 71
  • 34. Search predictions Motivation Past work mainly focuses on predicting the present1 and ignores baseline models trained on publicly available data 8 Actual 7 Search Flu Level (Percent) 6 Autoregressive 5 4 3 2 1 2004 2005 2006 2007 2008 2009 2010 Date 1 Varian, 2009 Jake Hofman (Yahoo! Research) Learning from Online Activity November 30, 2011 8 / 71
  • 35. Search predictions Motivation We predict future sales for movies, video games, and music "Transformers 2" "Tom Clancy's HAWX" "Right Round" a b c 10 Search Volume Search Volume 20 Rank 30 Billboard 40 Search −30 −20 −10 0 10 20 30 −30 −20 −10 0 10 20 30 Mar−09 Apr−09 May−09 Jun−09 Jul−09 Aug−09 Time to Release (Days) Time to Release (Days) Week Jake Hofman (Yahoo! Research) Learning from Online Activity November 30, 2011 9 / 71
  • 36. Search predictions Search models For movies and video games, predict opening weekend box office and first month sales, respectively: log(revenue) = β0 + β1 log(search) + For music, predict following week’s Billboard Hot 100 rank: billboardt+1 = β0 + β1 searcht + β2 searcht−1 + Jake Hofman (Yahoo! Research) Learning from Online Activity November 30, 2011 10 / 71
  • 37. Search predictions Search volume Jake Hofman (Yahoo! Research) Learning from Online Activity November 30, 2011 11 / 71