SlideShare ist ein Scribd-Unternehmen logo
1 von 12
Comments, panel session: ICSE’14:
“After the gold rush”
tim.menzies@gmail.com
Dagstuhl workshop on SA
June 23, 2014
Organized by…. Da bird Da Men Da Mann 1
Welcome to
the Wild West
• Gold rush:
– To extract nuggets of insight.
• Too many inexperienced
“cowboys”:
– No use of best or safe practices.
• “Goldfish bowl” panel:
– Best practices, what not to do
2
Street light effect
• Moral #1:
– look at the real data, not
just the conveniently
available data
• “Before we collect the
data, need to redefine the
right data to collect.”
• “Garbage in, garbage out”
• “Before analyzing
terabytes of data, reflect
some on user goals.”
“I’m looking for my keys.”
3
Can we reason about data, without
detailed background knowledge?
• Traditional view:
– GQM
– Traditional science:
• Define data to collected
• Collect
• Infer
• “Newer” view:
– Operational science (Mockus, keynote, MSR, 2014)
• Find data
• Reason about it
• Prone to “streetlight effect”
4
Empiricists : Rationalists
• Observation : use of background knowledge
– Mockus : Basili
– Norvig : Chomsky
– Locke : Leibnitz
– Aristotle : Plato
– Newton : Descartes
• Ike said:
– “I make no hypotheses.”
– “I feign no hypotheses.”
5
Working in the light
• Moral #2:
– “before we rush to the new,
lets reflect on what we can
learn from what we can see
right now. “
• “Too fast, too early to
expect actionable data for
some specific area. “
• “Building models on what
we have before pushing
ahead.”
• “Need to invest more in
data science
infrastructure”
“I’m exploring the data
without pre-conceived biases.”
6
Full disclosure: Menzies is not rational
(he’s an empiricist)
Accidental
discoveries
• America
• Penicillin
• Anesthesia
• Big bang radiation
• Internal pacemakers
• Microwave ovens
• X-rays
• Safety glass
• Plastics (non-conductive heat
resistant polymer)
• Vulcanised rubber
• Viagra
Wikipedia lists of human
cognitive biases
• 100+ entries
– The way we routinely get it
wrong, every day.
7
All (current) conclusions
are wrong
Prone to revisions
• By subsequent analysis
• Any current models is
– Wrong, but useful
• Timm’s Law:
– Less conclusions, More
conversations
To find better conclusions …
• Just keep looking
• For a community to find
better conclusions
– Discuss more, share
more
8
Between Turkish Toasters
AND NASA Space Ships
9
Raw dimensions less informative than
underlying dimensions
10
Q: How to TRANSFER
Lessons Learned?
• Ignore most of the data
• relevancy filtering: Turhan ESEj’09; Peters TSE’13
• variance filtering: Kocaguneli TSE’12,TSE’13
• performance similarities: He ESEM’13
• Contort the data
• spectral learning (working in PCA
space or some other rotation)
Menzies, TSE’13; Nam, ICSE’13
• Build a bickering committee
• Ensembles Minku, PROMISE’12
11
A little bit of this,
a little bit of that
• Moral #3:
– “My Father’s house has
many rooms.”
• “False dichotomy between
these two approaches.
Really need to bridge
these two modes.“
• “Not the scientific method
but scientific methods
(plural).”
“Aha! Bus tracks! I can follow these to the bus stop
and not drive home drunk.”
12

Weitere ähnliche Inhalte

Ähnlich wie Goldrush

datamining_Lecture_1(introduction).pptx
datamining_Lecture_1(introduction).pptxdatamining_Lecture_1(introduction).pptx
datamining_Lecture_1(introduction).pptx
HASHEMHASH
 

Ähnlich wie Goldrush (20)

Laws and limits of data science 11 10-14
Laws and limits of data science 11 10-14Laws and limits of data science 11 10-14
Laws and limits of data science 11 10-14
 
datamining_Lecture_1(introduction).pptx
datamining_Lecture_1(introduction).pptxdatamining_Lecture_1(introduction).pptx
datamining_Lecture_1(introduction).pptx
 
Where is the Share Button? Intellectual Property Issues Surrounding Data for ...
Where is the Share Button? Intellectual Property Issues Surrounding Data for ...Where is the Share Button? Intellectual Property Issues Surrounding Data for ...
Where is the Share Button? Intellectual Property Issues Surrounding Data for ...
 
Communicating Clickable Complexities: From Nuclei to AI by Jenny Burns & Rach...
Communicating Clickable Complexities: From Nuclei to AI by Jenny Burns & Rach...Communicating Clickable Complexities: From Nuclei to AI by Jenny Burns & Rach...
Communicating Clickable Complexities: From Nuclei to AI by Jenny Burns & Rach...
 
The science and art of methodology
The science and art of methodologyThe science and art of methodology
The science and art of methodology
 
Data science and good questions eric kostello
Data science and good questions eric kostelloData science and good questions eric kostello
Data science and good questions eric kostello
 
Nicholas Jewell MedicReS World Congress 2014
Nicholas Jewell MedicReS World Congress 2014Nicholas Jewell MedicReS World Congress 2014
Nicholas Jewell MedicReS World Congress 2014
 
Look Around: Question Answering, Serendipity, and the Research Process of Sch...
Look Around: Question Answering, Serendipity, and the Research Process of Sch...Look Around: Question Answering, Serendipity, and the Research Process of Sch...
Look Around: Question Answering, Serendipity, and the Research Process of Sch...
 
Social Graphs for Better Drug Development
Social Graphs for Better Drug DevelopmentSocial Graphs for Better Drug Development
Social Graphs for Better Drug Development
 
OpenML 2014
OpenML 2014OpenML 2014
OpenML 2014
 
Privacy, Ethics, and Future Uses of the Social Web
Privacy, Ethics, and Future Uses of the Social WebPrivacy, Ethics, and Future Uses of the Social Web
Privacy, Ethics, and Future Uses of the Social Web
 
NGP Retreat Open Science 2015
NGP Retreat Open Science 2015NGP Retreat Open Science 2015
NGP Retreat Open Science 2015
 
Citizen Scientists: Design and Practice
Citizen Scientists: Design and PracticeCitizen Scientists: Design and Practice
Citizen Scientists: Design and Practice
 
Bias and the Data Lifecycle
Bias and the Data LifecycleBias and the Data Lifecycle
Bias and the Data Lifecycle
 
A Brief Tutorial On Data Mining-20140701
A Brief Tutorial On Data Mining-20140701A Brief Tutorial On Data Mining-20140701
A Brief Tutorial On Data Mining-20140701
 
World ctc2013scoopitcytomics
World ctc2013scoopitcytomicsWorld ctc2013scoopitcytomics
World ctc2013scoopitcytomics
 
Welcome to Earth Science
Welcome to Earth ScienceWelcome to Earth Science
Welcome to Earth Science
 
المحاضرة الثانية - سلسلة اساسيات البحث العلمي
المحاضرة الثانية - سلسلة اساسيات البحث العلميالمحاضرة الثانية - سلسلة اساسيات البحث العلمي
المحاضرة الثانية - سلسلة اساسيات البحث العلمي
 
BSides Augusta 2015 - Building a Better Analyst Using Cognitive Psychology
BSides Augusta 2015 - Building a Better Analyst Using Cognitive PsychologyBSides Augusta 2015 - Building a Better Analyst Using Cognitive Psychology
BSides Augusta 2015 - Building a Better Analyst Using Cognitive Psychology
 
Big data and the question of objectivity
Big data and  the question of objectivityBig data and  the question of objectivity
Big data and the question of objectivity
 

Mehr von CS, NcState

Ai4se lab template
Ai4se lab templateAi4se lab template
Ai4se lab template
CS, NcState
 
Dagstuhl14 intro-v1
Dagstuhl14 intro-v1Dagstuhl14 intro-v1
Dagstuhl14 intro-v1
CS, NcState
 
In the age of Big Data, what role for Software Engineers?
In the age of Big Data, what role for Software Engineers?In the age of Big Data, what role for Software Engineers?
In the age of Big Data, what role for Software Engineers?
CS, NcState
 
How to do better experiments in SE
How to do better experiments in SEHow to do better experiments in SE
How to do better experiments in SE
CS, NcState
 

Mehr von CS, NcState (20)

Talks2015 novdec
Talks2015 novdecTalks2015 novdec
Talks2015 novdec
 
Future se oct15
Future se oct15Future se oct15
Future se oct15
 
GALE: Geometric active learning for Search-Based Software Engineering
GALE: Geometric active learning for Search-Based Software EngineeringGALE: Geometric active learning for Search-Based Software Engineering
GALE: Geometric active learning for Search-Based Software Engineering
 
Big Data: the weakest link
Big Data: the weakest linkBig Data: the weakest link
Big Data: the weakest link
 
Welcome to ICSE NIER’15 (new ideas and emerging results).
Welcome to ICSE NIER’15 (new ideas and emerging results).Welcome to ICSE NIER’15 (new ideas and emerging results).
Welcome to ICSE NIER’15 (new ideas and emerging results).
 
Icse15 Tech-briefing Data Science
Icse15 Tech-briefing Data ScienceIcse15 Tech-briefing Data Science
Icse15 Tech-briefing Data Science
 
Kits to Find the Bits that Fits
Kits to Find  the Bits that Fits Kits to Find  the Bits that Fits
Kits to Find the Bits that Fits
 
Ai4se lab template
Ai4se lab templateAi4se lab template
Ai4se lab template
 
Requirements Engineering
Requirements EngineeringRequirements Engineering
Requirements Engineering
 
172529main ken and_tim_software_assurance_research_at_west_virginia
172529main ken and_tim_software_assurance_research_at_west_virginia172529main ken and_tim_software_assurance_research_at_west_virginia
172529main ken and_tim_software_assurance_research_at_west_virginia
 
Automated Software Engineering
Automated Software EngineeringAutomated Software Engineering
Automated Software Engineering
 
Next Generation “Treatment Learning” (finding the diamonds in the dust)
Next Generation “Treatment Learning” (finding the diamonds in the dust)Next Generation “Treatment Learning” (finding the diamonds in the dust)
Next Generation “Treatment Learning” (finding the diamonds in the dust)
 
Dagstuhl14 intro-v1
Dagstuhl14 intro-v1Dagstuhl14 intro-v1
Dagstuhl14 intro-v1
 
Know thy tools
Know thy toolsKnow thy tools
Know thy tools
 
What Metrics Matter?
What Metrics Matter? What Metrics Matter?
What Metrics Matter?
 
In the age of Big Data, what role for Software Engineers?
In the age of Big Data, what role for Software Engineers?In the age of Big Data, what role for Software Engineers?
In the age of Big Data, what role for Software Engineers?
 
Sayyad slides ase13_v4
Sayyad slides ase13_v4Sayyad slides ase13_v4
Sayyad slides ase13_v4
 
Ase2013
Ase2013Ase2013
Ase2013
 
Warning: don't do CS
Warning: don't do CSWarning: don't do CS
Warning: don't do CS
 
How to do better experiments in SE
How to do better experiments in SEHow to do better experiments in SE
How to do better experiments in SE
 

Kürzlich hochgeladen

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Kürzlich hochgeladen (20)

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 

Goldrush

  • 1. Comments, panel session: ICSE’14: “After the gold rush” tim.menzies@gmail.com Dagstuhl workshop on SA June 23, 2014 Organized by…. Da bird Da Men Da Mann 1
  • 2. Welcome to the Wild West • Gold rush: – To extract nuggets of insight. • Too many inexperienced “cowboys”: – No use of best or safe practices. • “Goldfish bowl” panel: – Best practices, what not to do 2
  • 3. Street light effect • Moral #1: – look at the real data, not just the conveniently available data • “Before we collect the data, need to redefine the right data to collect.” • “Garbage in, garbage out” • “Before analyzing terabytes of data, reflect some on user goals.” “I’m looking for my keys.” 3
  • 4. Can we reason about data, without detailed background knowledge? • Traditional view: – GQM – Traditional science: • Define data to collected • Collect • Infer • “Newer” view: – Operational science (Mockus, keynote, MSR, 2014) • Find data • Reason about it • Prone to “streetlight effect” 4
  • 5. Empiricists : Rationalists • Observation : use of background knowledge – Mockus : Basili – Norvig : Chomsky – Locke : Leibnitz – Aristotle : Plato – Newton : Descartes • Ike said: – “I make no hypotheses.” – “I feign no hypotheses.” 5
  • 6. Working in the light • Moral #2: – “before we rush to the new, lets reflect on what we can learn from what we can see right now. “ • “Too fast, too early to expect actionable data for some specific area. “ • “Building models on what we have before pushing ahead.” • “Need to invest more in data science infrastructure” “I’m exploring the data without pre-conceived biases.” 6
  • 7. Full disclosure: Menzies is not rational (he’s an empiricist) Accidental discoveries • America • Penicillin • Anesthesia • Big bang radiation • Internal pacemakers • Microwave ovens • X-rays • Safety glass • Plastics (non-conductive heat resistant polymer) • Vulcanised rubber • Viagra Wikipedia lists of human cognitive biases • 100+ entries – The way we routinely get it wrong, every day. 7
  • 8. All (current) conclusions are wrong Prone to revisions • By subsequent analysis • Any current models is – Wrong, but useful • Timm’s Law: – Less conclusions, More conversations To find better conclusions … • Just keep looking • For a community to find better conclusions – Discuss more, share more 8
  • 9. Between Turkish Toasters AND NASA Space Ships 9
  • 10. Raw dimensions less informative than underlying dimensions 10
  • 11. Q: How to TRANSFER Lessons Learned? • Ignore most of the data • relevancy filtering: Turhan ESEj’09; Peters TSE’13 • variance filtering: Kocaguneli TSE’12,TSE’13 • performance similarities: He ESEM’13 • Contort the data • spectral learning (working in PCA space or some other rotation) Menzies, TSE’13; Nam, ICSE’13 • Build a bickering committee • Ensembles Minku, PROMISE’12 11
  • 12. A little bit of this, a little bit of that • Moral #3: – “My Father’s house has many rooms.” • “False dichotomy between these two approaches. Really need to bridge these two modes.“ • “Not the scientific method but scientific methods (plural).” “Aha! Bus tracks! I can follow these to the bus stop and not drive home drunk.” 12