SlideShare ist ein Scribd-Unternehmen logo
1 von 8
Research Triangle Analysts
          rtpanalysts.org
 • Intro to Kaggle.com
 • Titantic Getting Started Competition
 • Prediction Problem with two outcome Levels
 • Opportunity for an extended Data Shootout with Kaggle.com
   providing data, scoring, tutorials, forums.
 • Public domain data allows for detailed discussion of modeling
   issues and solutions without client data confidentiality concerns.
 • A common ground for in depth learning and debates on
   analytics topics.

                                    • Participants of all levels of expertise welcome
                                    • You influence the direction of this effort by your
                                      participation. Post questions and thoughts on
                                      rtpanalysts.org .
                                    • Welcome!



Slides by Linda Schumacher. Contact via Research Triangle Analysts LinkedIn group member list   1
Classification Problems
• 2- levels or outcomes
• Data       Model      Predictions
• Examples
  – Find customers who are likely to buy product
  – Id patients likely to be admitted to hospital
  – Categorize cells as cancerous or benign
  – Who survives the Titanic disaster?


       Slides by Linda Schumacher. Contact via Research Triangle Analysts LinkedIn group member
                                                                                                  2
                                                  list
Classifier - Trees
• Decision Trees


                            All
                        Passengers



          Female]                                     Male



                 Second
First Class                           Age < 16                 Age >= 16
                Third Class

                               Slides by Linda Schumacher. Contact via
                              Research Triangle Analysts LinkedIn group    3
                                             member list
Classifier - Logistic Regression
• Equation – Logistic Regression
• F(x) = sigmoid(age+class-embarked+gender)




                Slides by Linda Schumacher. Contact via
               Research Triangle Analysts LinkedIn group   4
                              member list
Titanic Data
• Passenger List
  – Name, class, fare, embarked, family
    members, age, cabin, etc
  – Survival
• Training Set of 891 Passengers
• Test Set of 418



                    Slides by Linda Schumacher. Contact via
                   Research Triangle Analysts LinkedIn group   5
                                  member list
Kaggle.com

• Data
• Tutorials
  – Tools – Excel, Python
  – Models – Trees, Random Forests
• Submission
• Leaderboard

                 Slides by Linda Schumacher. Contact via
                Research Triangle Analysts LinkedIn group   6
                               member list
Where to Start
• create a Kaggle account
  http://www.kaggle.com/account/register
• read and agree to the rules if you choose to continue
• enter the Kaggle Titantic Competition
  http://www.kaggle.com/c/titanic-gettingStarted
• download train.csv and test.csv
• If you choose to use R, obtain-download R from
  http://www.r-project.org/ You will have to choose a
  ‘mirror’ or site – usually a university or research site
• If you share code or data outside of your Kaggle
  team, be sure to post a copy on Kaggle Titanic Forum
  see http://www.kaggle.com/c/titanic-
  gettingStarted/details/rules
                     Slides by Linda Schumacher. Contact via
                    Research Triangle Analysts LinkedIn group   7
                                   member list
Benefits
• Extended Data Shoot-Out
• Tailor participation
• Opportunities
  -   New classifiers
  -   New tools, languages
  -   Training vs test error
  -   Round Table Discussion of Solutions
       - Compare model results


                     Slides by Linda Schumacher. Contact via
                    Research Triangle Analysts LinkedIn group   8
                                   member list

Weitere ähnliche Inhalte

Ähnlich wie Titanic prediction

Thomas Krohn - Dpharm 2012 - Disruptive Innovation: Moving Beyond the Talk
Thomas Krohn - Dpharm 2012 - Disruptive Innovation: Moving Beyond the TalkThomas Krohn - Dpharm 2012 - Disruptive Innovation: Moving Beyond the Talk
Thomas Krohn - Dpharm 2012 - Disruptive Innovation: Moving Beyond the Talk
ConferenceForum
 
Multi-Domain Alias Matching Using Machine Learning
Multi-Domain Alias Matching Using Machine LearningMulti-Domain Alias Matching Using Machine Learning
Multi-Domain Alias Matching Using Machine Learning
Amendra Shrestha
 
Aug2013 NIST program slides
Aug2013 NIST program slidesAug2013 NIST program slides
Aug2013 NIST program slides
GenomeInABottle
 
Requirements Management Part 1 - Management and Elicitation
Requirements Management Part 1 - Management and ElicitationRequirements Management Part 1 - Management and Elicitation
Requirements Management Part 1 - Management and Elicitation
Mohamed Shaaban
 
Research by design
Research by designResearch by design
Research by design
Paul Rogers
 

Ähnlich wie Titanic prediction (20)

Thomas Krohn - Dpharm 2012 - Disruptive Innovation: Moving Beyond the Talk
Thomas Krohn - Dpharm 2012 - Disruptive Innovation: Moving Beyond the TalkThomas Krohn - Dpharm 2012 - Disruptive Innovation: Moving Beyond the Talk
Thomas Krohn - Dpharm 2012 - Disruptive Innovation: Moving Beyond the Talk
 
Multi-Domain Alias Matching Using Machine Learning
Multi-Domain Alias Matching Using Machine LearningMulti-Domain Alias Matching Using Machine Learning
Multi-Domain Alias Matching Using Machine Learning
 
Data Mining - The Big Picture!
Data Mining - The Big Picture!Data Mining - The Big Picture!
Data Mining - The Big Picture!
 
Dia sds2015 web version
Dia sds2015 web versionDia sds2015 web version
Dia sds2015 web version
 
Crack Data Science Challenges: 0 to 1
Crack Data Science Challenges: 0 to 1Crack Data Science Challenges: 0 to 1
Crack Data Science Challenges: 0 to 1
 
Örüntü tanıma - Pattern Recognition
Örüntü tanıma - Pattern RecognitionÖrüntü tanıma - Pattern Recognition
Örüntü tanıma - Pattern Recognition
 
Turning Information chaos into reliable data
Turning Information chaos into reliable dataTurning Information chaos into reliable data
Turning Information chaos into reliable data
 
Aug2013 NIST program slides
Aug2013 NIST program slidesAug2013 NIST program slides
Aug2013 NIST program slides
 
TDWI Solution Summit San Diego 2014 Advanced Analytics at Macys.com
TDWI Solution Summit San Diego 2014 Advanced Analytics at Macys.comTDWI Solution Summit San Diego 2014 Advanced Analytics at Macys.com
TDWI Solution Summit San Diego 2014 Advanced Analytics at Macys.com
 
Data Management for Undergraduate Research
Data Management for Undergraduate ResearchData Management for Undergraduate Research
Data Management for Undergraduate Research
 
Research methods
Research methodsResearch methods
Research methods
 
Multiple Models with Multiple Perspectives in a Cross-Functional Team - KanDD...
Multiple Models with Multiple Perspectives in a Cross-Functional Team - KanDD...Multiple Models with Multiple Perspectives in a Cross-Functional Team - KanDD...
Multiple Models with Multiple Perspectives in a Cross-Functional Team - KanDD...
 
Requirements Management Part 1 - Management and Elicitation
Requirements Management Part 1 - Management and ElicitationRequirements Management Part 1 - Management and Elicitation
Requirements Management Part 1 - Management and Elicitation
 
Domain Identification for Linked Open Data
Domain Identification for Linked Open DataDomain Identification for Linked Open Data
Domain Identification for Linked Open Data
 
Domain Identification for Linked Open Data
Domain Identification for Linked Open DataDomain Identification for Linked Open Data
Domain Identification for Linked Open Data
 
Implimenting and Mitigating Change with all of this Newfangled Technology
Implimenting and Mitigating Change with all of this Newfangled TechnologyImplimenting and Mitigating Change with all of this Newfangled Technology
Implimenting and Mitigating Change with all of this Newfangled Technology
 
FTDNA Houston Project Management
FTDNA Houston Project ManagementFTDNA Houston Project Management
FTDNA Houston Project Management
 
Summit slide loop ny
Summit slide loop nySummit slide loop ny
Summit slide loop ny
 
Research by design
Research by designResearch by design
Research by design
 
Disrupting Data Discovery
Disrupting Data DiscoveryDisrupting Data Discovery
Disrupting Data Discovery
 

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 

Titanic prediction

  • 1. Research Triangle Analysts rtpanalysts.org • Intro to Kaggle.com • Titantic Getting Started Competition • Prediction Problem with two outcome Levels • Opportunity for an extended Data Shootout with Kaggle.com providing data, scoring, tutorials, forums. • Public domain data allows for detailed discussion of modeling issues and solutions without client data confidentiality concerns. • A common ground for in depth learning and debates on analytics topics. • Participants of all levels of expertise welcome • You influence the direction of this effort by your participation. Post questions and thoughts on rtpanalysts.org . • Welcome! Slides by Linda Schumacher. Contact via Research Triangle Analysts LinkedIn group member list 1
  • 2. Classification Problems • 2- levels or outcomes • Data Model Predictions • Examples – Find customers who are likely to buy product – Id patients likely to be admitted to hospital – Categorize cells as cancerous or benign – Who survives the Titanic disaster? Slides by Linda Schumacher. Contact via Research Triangle Analysts LinkedIn group member 2 list
  • 3. Classifier - Trees • Decision Trees All Passengers Female] Male Second First Class Age < 16 Age >= 16 Third Class Slides by Linda Schumacher. Contact via Research Triangle Analysts LinkedIn group 3 member list
  • 4. Classifier - Logistic Regression • Equation – Logistic Regression • F(x) = sigmoid(age+class-embarked+gender) Slides by Linda Schumacher. Contact via Research Triangle Analysts LinkedIn group 4 member list
  • 5. Titanic Data • Passenger List – Name, class, fare, embarked, family members, age, cabin, etc – Survival • Training Set of 891 Passengers • Test Set of 418 Slides by Linda Schumacher. Contact via Research Triangle Analysts LinkedIn group 5 member list
  • 6. Kaggle.com • Data • Tutorials – Tools – Excel, Python – Models – Trees, Random Forests • Submission • Leaderboard Slides by Linda Schumacher. Contact via Research Triangle Analysts LinkedIn group 6 member list
  • 7. Where to Start • create a Kaggle account http://www.kaggle.com/account/register • read and agree to the rules if you choose to continue • enter the Kaggle Titantic Competition http://www.kaggle.com/c/titanic-gettingStarted • download train.csv and test.csv • If you choose to use R, obtain-download R from http://www.r-project.org/ You will have to choose a ‘mirror’ or site – usually a university or research site • If you share code or data outside of your Kaggle team, be sure to post a copy on Kaggle Titanic Forum see http://www.kaggle.com/c/titanic- gettingStarted/details/rules Slides by Linda Schumacher. Contact via Research Triangle Analysts LinkedIn group 7 member list
  • 8. Benefits • Extended Data Shoot-Out • Tailor participation • Opportunities - New classifiers - New tools, languages - Training vs test error - Round Table Discussion of Solutions - Compare model results Slides by Linda Schumacher. Contact via Research Triangle Analysts LinkedIn group 8 member list