SlideShare ist ein Scribd-Unternehmen logo
1 von 13
Idea Engineering
tim@menzies.us
PROMISE’13
Oct’13
0. algorithm
mining
1. landscape
mining
2. decision
mining
3. discussion
mining
yesterday today
tomorrow future
The Premises of PROMISE
(2005)
– Wanted: predictions
• Nope. Users want decision, or engagement
The Premises of PROMISE
(2005)
– Wanted: predictions
• Nope. Users want decision, or engagement
– Data mining will reveal “the truth” about SE
• [Dejaeger: TSE’11], [Hall: TSE’12], [Shepperd:COW’13]
• Not(Better learners = better conclusions)
The Premises of PROMISE
(2005)
– Wanted: predictions
• Nope. Users want decision, or engagement
– Data mining will reveal “the truth” about SE
• [Dejaeger: TSE’11], [Hall: TSE’12], [Shepperd:COW’13]
• Not(Better learners = better conclusions)
– Sooner or later: enough data for general conclusions
• Found more differences than generalities
• Special issues: [IST’13], [ESEj’13]
• Best papers, ASE’11, MSR’12
• Menzies, Zimmermann et al [TSE’13]
• Lots of local models
Landscape mining:
look before your leap
• Report what is true about the
data
– Not trivia on how algorithms
walk that data
• Map the landscape
– Reason on each part of map
• E.g. landscape mining
– Unsupervised iterative
dichotomization
– Cluster, prune
– Then generate rules
5
Landscape mining:
look before your leap
• Report what is true about the
data
– Not trivia on how algorithms
walk that data
• Map the landscape
– Reason on each part of map
• E.g. landscape mining
– Unsupervised iterative
dichotomization
– Cluster, prune
– Then generate rules
• Different to “leap before you look”
– i.e. skew learning by class variable
– then study the results
• E.g. C4.5, CART, Fayya-Iranni, etc
– Supervised iterative dichotomization
• E.g. 61% * 300+effort estimation
papers
– Algorithm tinkering, without end
6
Find landscape = cluster data, assign “heights”
Find decisions = report delta highs to lows
Monitor discussions = watch, help, communities explore deltas
7
IDEA Engineering = <landscape, decisions, discussion>
Spectral Landscape Mining
• Spectrum = condition that is not
limited to a specific set of values
but varies in a continuum.
• Groups together a broad range of
conditions or behaviors under
one single title
• In mathematics, the spectrum of
a (finite-dimensional) matrix is
the set of its eigenvalues.
• Nystrom algorithms:
approximations to eigenvalues
– FASTMAP: linear time
Project data on first 2 PCA; grid that data
e.g. Nasa93dem
1) project 23 dimensions projected into 2
2a) cluster
2b) replace clusters with centroids.
MOEA: score=
effort+defects
+months
Sanity check:
What information loss?
• E.g. POI-3
– 400+ examples
– 20 centroids
• Prediction via:
– Extrapolation between two
nearest centroids
• Works as well as
– Random forest, Naïve Bayes
• For defect prediction (10 data sets)
– Linear regression, M5’
• For effort estimation (10 data sets)
• Find delta between neighbors that go worse to better
• Very small rules, found in logLinear time
• Menzies et al. [TSE’13]
11
Planning = Inter-cluster contrast sets
Applications
• Prediction
• Planning
• Monitoring
• Multi-objective optimization
– Cluster first on N objectives
• Anomaly detection
• Incremental theory revision
• Compression
• Privacy
• etc
Idea Engineering
0. algorithm
mining
1. landscape
mining
2. decision
mining
3. discussion
mining
yesterday today
tomorrow future
Beyond Data Mining, T. Menzies, IEEE Software, 2013, to appear
13
Q: why call it
mining?
• A1: because all the primitives for the above are
in the data mining literature
• So we know how to get from here to there
• A2: because data mining scales

Weitere ähnliche Inhalte

Ähnlich wie Idea Engineering

Icse15 Tech-briefing Data Science
Icse15 Tech-briefing Data ScienceIcse15 Tech-briefing Data Science
Icse15 Tech-briefing Data ScienceCS, NcState
 
What Metrics Matter?
What Metrics Matter? What Metrics Matter?
What Metrics Matter? CS, NcState
 
Machine Learning Summary for Caltech2
Machine Learning Summary for Caltech2Machine Learning Summary for Caltech2
Machine Learning Summary for Caltech2Lukas Mandrake
 
Local vs. Global Models for Effort Estimation and Defect Prediction
Local vs. Global Models for Effort Estimation and Defect Prediction Local vs. Global Models for Effort Estimation and Defect Prediction
Local vs. Global Models for Effort Estimation and Defect Prediction CS, NcState
 
LSST Solar System Science: MOPS Status, the Science, and Your Questions
LSST Solar System Science: MOPS Status, the Science, and Your QuestionsLSST Solar System Science: MOPS Status, the Science, and Your Questions
LSST Solar System Science: MOPS Status, the Science, and Your QuestionsMario Juric
 
Ml pluss ejan2013
Ml pluss ejan2013Ml pluss ejan2013
Ml pluss ejan2013CS, NcState
 
Tim Menzies, directions in Data Science
Tim Menzies, directions in Data ScienceTim Menzies, directions in Data Science
Tim Menzies, directions in Data ScienceCS, NcState
 
Scalable and Efficient Algorithms for Analysis of Massive, Streaming Graphs
Scalable and Efficient Algorithms for Analysis of Massive, Streaming GraphsScalable and Efficient Algorithms for Analysis of Massive, Streaming Graphs
Scalable and Efficient Algorithms for Analysis of Massive, Streaming GraphsJason Riedy
 
Franhouder july2013
Franhouder july2013Franhouder july2013
Franhouder july2013CS, NcState
 
Hyperoptimized Machine Learning and Deep Learning Methods For Geospatial and ...
Hyperoptimized Machine Learning and Deep Learning Methods For Geospatial and ...Hyperoptimized Machine Learning and Deep Learning Methods For Geospatial and ...
Hyperoptimized Machine Learning and Deep Learning Methods For Geospatial and ...Neelabha Pant
 
Clustering large-scale data Buzzwords 2013 full
Clustering large-scale data Buzzwords 2013 fullClustering large-scale data Buzzwords 2013 full
Clustering large-scale data Buzzwords 2013 fullDan-George Filimon
 
Topological Data Analysis of Complex Spatial Systems
Topological Data Analysis of Complex Spatial SystemsTopological Data Analysis of Complex Spatial Systems
Topological Data Analysis of Complex Spatial SystemsMason Porter
 
Final_Talk_Tool_Team
Final_Talk_Tool_TeamFinal_Talk_Tool_Team
Final_Talk_Tool_TeamMehdi Lamee
 

Ähnlich wie Idea Engineering (20)

Icse15 Tech-briefing Data Science
Icse15 Tech-briefing Data ScienceIcse15 Tech-briefing Data Science
Icse15 Tech-briefing Data Science
 
What Metrics Matter?
What Metrics Matter? What Metrics Matter?
What Metrics Matter?
 
Machine Learning Summary for Caltech2
Machine Learning Summary for Caltech2Machine Learning Summary for Caltech2
Machine Learning Summary for Caltech2
 
MUMS: Transition & SPUQ Workshop - Uncertainty Quantification in Materials Wo...
MUMS: Transition & SPUQ Workshop - Uncertainty Quantification in Materials Wo...MUMS: Transition & SPUQ Workshop - Uncertainty Quantification in Materials Wo...
MUMS: Transition & SPUQ Workshop - Uncertainty Quantification in Materials Wo...
 
Local vs. Global Models for Effort Estimation and Defect Prediction
Local vs. Global Models for Effort Estimation and Defect Prediction Local vs. Global Models for Effort Estimation and Defect Prediction
Local vs. Global Models for Effort Estimation and Defect Prediction
 
LSST Solar System Science: MOPS Status, the Science, and Your Questions
LSST Solar System Science: MOPS Status, the Science, and Your QuestionsLSST Solar System Science: MOPS Status, the Science, and Your Questions
LSST Solar System Science: MOPS Status, the Science, and Your Questions
 
Ml pluss ejan2013
Ml pluss ejan2013Ml pluss ejan2013
Ml pluss ejan2013
 
Tim Menzies, directions in Data Science
Tim Menzies, directions in Data ScienceTim Menzies, directions in Data Science
Tim Menzies, directions in Data Science
 
Scalable and Efficient Algorithms for Analysis of Massive, Streaming Graphs
Scalable and Efficient Algorithms for Analysis of Massive, Streaming GraphsScalable and Efficient Algorithms for Analysis of Massive, Streaming Graphs
Scalable and Efficient Algorithms for Analysis of Massive, Streaming Graphs
 
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
 
Franhouder july2013
Franhouder july2013Franhouder july2013
Franhouder july2013
 
Licentiate Defense Slide
Licentiate Defense SlideLicentiate Defense Slide
Licentiate Defense Slide
 
Hyperoptimized Machine Learning and Deep Learning Methods For Geospatial and ...
Hyperoptimized Machine Learning and Deep Learning Methods For Geospatial and ...Hyperoptimized Machine Learning and Deep Learning Methods For Geospatial and ...
Hyperoptimized Machine Learning and Deep Learning Methods For Geospatial and ...
 
Clustering large-scale data Buzzwords 2013 full
Clustering large-scale data Buzzwords 2013 fullClustering large-scale data Buzzwords 2013 full
Clustering large-scale data Buzzwords 2013 full
 
Sensors1(1)
Sensors1(1)Sensors1(1)
Sensors1(1)
 
Clustering - ACM 2013 02-25
Clustering - ACM 2013 02-25Clustering - ACM 2013 02-25
Clustering - ACM 2013 02-25
 
Topological Data Analysis of Complex Spatial Systems
Topological Data Analysis of Complex Spatial SystemsTopological Data Analysis of Complex Spatial Systems
Topological Data Analysis of Complex Spatial Systems
 
Lecture2-DT.pptx
Lecture2-DT.pptxLecture2-DT.pptx
Lecture2-DT.pptx
 
social.pptx
social.pptxsocial.pptx
social.pptx
 
Final_Talk_Tool_Team
Final_Talk_Tool_TeamFinal_Talk_Tool_Team
Final_Talk_Tool_Team
 

Mehr von CS, NcState

GALE: Geometric active learning for Search-Based Software Engineering
GALE: Geometric active learning for Search-Based Software EngineeringGALE: Geometric active learning for Search-Based Software Engineering
GALE: Geometric active learning for Search-Based Software EngineeringCS, NcState
 
Three Laws of Trusted Data Sharing: (Building a Better Business Case for Dat...
Three Laws of Trusted Data Sharing:(Building a Better Business Case for Dat...Three Laws of Trusted Data Sharing:(Building a Better Business Case for Dat...
Three Laws of Trusted Data Sharing: (Building a Better Business Case for Dat...CS, NcState
 
Lexisnexis june9
Lexisnexis june9Lexisnexis june9
Lexisnexis june9CS, NcState
 
Kits to Find the Bits that Fits
Kits to Find  the Bits that Fits Kits to Find  the Bits that Fits
Kits to Find the Bits that Fits CS, NcState
 
Ai4se lab template
Ai4se lab templateAi4se lab template
Ai4se lab templateCS, NcState
 
Automated Software Enging, Fall 2015, NCSU
Automated Software Enging, Fall 2015, NCSUAutomated Software Enging, Fall 2015, NCSU
Automated Software Enging, Fall 2015, NCSUCS, NcState
 
Requirements Engineering
Requirements EngineeringRequirements Engineering
Requirements EngineeringCS, NcState
 
172529main ken and_tim_software_assurance_research_at_west_virginia
172529main ken and_tim_software_assurance_research_at_west_virginia172529main ken and_tim_software_assurance_research_at_west_virginia
172529main ken and_tim_software_assurance_research_at_west_virginiaCS, NcState
 
Automated Software Engineering
Automated Software EngineeringAutomated Software Engineering
Automated Software EngineeringCS, NcState
 
Dagstuhl14 intro-v1
Dagstuhl14 intro-v1Dagstuhl14 intro-v1
Dagstuhl14 intro-v1CS, NcState
 
The Art and Science of Analyzing Software Data
The Art and Science of Analyzing Software DataThe Art and Science of Analyzing Software Data
The Art and Science of Analyzing Software DataCS, NcState
 
Sayyad slides ase13_v4
Sayyad slides ase13_v4Sayyad slides ase13_v4
Sayyad slides ase13_v4CS, NcState
 
Warning: don't do CS
Warning: don't do CSWarning: don't do CS
Warning: don't do CSCS, NcState
 
How to do better experiments in SE
How to do better experiments in SEHow to do better experiments in SE
How to do better experiments in SECS, NcState
 
Icse 2013-tutorial-data-science-for-software-engineering
Icse 2013-tutorial-data-science-for-software-engineeringIcse 2013-tutorial-data-science-for-software-engineering
Icse 2013-tutorial-data-science-for-software-engineeringCS, NcState
 
Dm sei-tutorial-v7
Dm sei-tutorial-v7Dm sei-tutorial-v7
Dm sei-tutorial-v7CS, NcState
 

Mehr von CS, NcState (20)

Future se oct15
Future se oct15Future se oct15
Future se oct15
 
GALE: Geometric active learning for Search-Based Software Engineering
GALE: Geometric active learning for Search-Based Software EngineeringGALE: Geometric active learning for Search-Based Software Engineering
GALE: Geometric active learning for Search-Based Software Engineering
 
Three Laws of Trusted Data Sharing: (Building a Better Business Case for Dat...
Three Laws of Trusted Data Sharing:(Building a Better Business Case for Dat...Three Laws of Trusted Data Sharing:(Building a Better Business Case for Dat...
Three Laws of Trusted Data Sharing: (Building a Better Business Case for Dat...
 
Lexisnexis june9
Lexisnexis june9Lexisnexis june9
Lexisnexis june9
 
Kits to Find the Bits that Fits
Kits to Find  the Bits that Fits Kits to Find  the Bits that Fits
Kits to Find the Bits that Fits
 
Ai4se lab template
Ai4se lab templateAi4se lab template
Ai4se lab template
 
Automated Software Enging, Fall 2015, NCSU
Automated Software Enging, Fall 2015, NCSUAutomated Software Enging, Fall 2015, NCSU
Automated Software Enging, Fall 2015, NCSU
 
Requirements Engineering
Requirements EngineeringRequirements Engineering
Requirements Engineering
 
172529main ken and_tim_software_assurance_research_at_west_virginia
172529main ken and_tim_software_assurance_research_at_west_virginia172529main ken and_tim_software_assurance_research_at_west_virginia
172529main ken and_tim_software_assurance_research_at_west_virginia
 
Automated Software Engineering
Automated Software EngineeringAutomated Software Engineering
Automated Software Engineering
 
Goldrush
GoldrushGoldrush
Goldrush
 
Dagstuhl14 intro-v1
Dagstuhl14 intro-v1Dagstuhl14 intro-v1
Dagstuhl14 intro-v1
 
Know thy tools
Know thy toolsKnow thy tools
Know thy tools
 
The Art and Science of Analyzing Software Data
The Art and Science of Analyzing Software DataThe Art and Science of Analyzing Software Data
The Art and Science of Analyzing Software Data
 
Sayyad slides ase13_v4
Sayyad slides ase13_v4Sayyad slides ase13_v4
Sayyad slides ase13_v4
 
Ase2013
Ase2013Ase2013
Ase2013
 
Warning: don't do CS
Warning: don't do CSWarning: don't do CS
Warning: don't do CS
 
How to do better experiments in SE
How to do better experiments in SEHow to do better experiments in SE
How to do better experiments in SE
 
Icse 2013-tutorial-data-science-for-software-engineering
Icse 2013-tutorial-data-science-for-software-engineeringIcse 2013-tutorial-data-science-for-software-engineering
Icse 2013-tutorial-data-science-for-software-engineering
 
Dm sei-tutorial-v7
Dm sei-tutorial-v7Dm sei-tutorial-v7
Dm sei-tutorial-v7
 

Kürzlich hochgeladen

So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Nikki Chapple
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...itnewsafrica
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesBernd Ruecker
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integrationmarketing932765
 

Kürzlich hochgeladen (20)

So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architectures
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
 

Idea Engineering

  • 1. Idea Engineering tim@menzies.us PROMISE’13 Oct’13 0. algorithm mining 1. landscape mining 2. decision mining 3. discussion mining yesterday today tomorrow future
  • 2. The Premises of PROMISE (2005) – Wanted: predictions • Nope. Users want decision, or engagement
  • 3. The Premises of PROMISE (2005) – Wanted: predictions • Nope. Users want decision, or engagement – Data mining will reveal “the truth” about SE • [Dejaeger: TSE’11], [Hall: TSE’12], [Shepperd:COW’13] • Not(Better learners = better conclusions)
  • 4. The Premises of PROMISE (2005) – Wanted: predictions • Nope. Users want decision, or engagement – Data mining will reveal “the truth” about SE • [Dejaeger: TSE’11], [Hall: TSE’12], [Shepperd:COW’13] • Not(Better learners = better conclusions) – Sooner or later: enough data for general conclusions • Found more differences than generalities • Special issues: [IST’13], [ESEj’13] • Best papers, ASE’11, MSR’12 • Menzies, Zimmermann et al [TSE’13] • Lots of local models
  • 5. Landscape mining: look before your leap • Report what is true about the data – Not trivia on how algorithms walk that data • Map the landscape – Reason on each part of map • E.g. landscape mining – Unsupervised iterative dichotomization – Cluster, prune – Then generate rules 5
  • 6. Landscape mining: look before your leap • Report what is true about the data – Not trivia on how algorithms walk that data • Map the landscape – Reason on each part of map • E.g. landscape mining – Unsupervised iterative dichotomization – Cluster, prune – Then generate rules • Different to “leap before you look” – i.e. skew learning by class variable – then study the results • E.g. C4.5, CART, Fayya-Iranni, etc – Supervised iterative dichotomization • E.g. 61% * 300+effort estimation papers – Algorithm tinkering, without end 6
  • 7. Find landscape = cluster data, assign “heights” Find decisions = report delta highs to lows Monitor discussions = watch, help, communities explore deltas 7 IDEA Engineering = <landscape, decisions, discussion>
  • 8. Spectral Landscape Mining • Spectrum = condition that is not limited to a specific set of values but varies in a continuum. • Groups together a broad range of conditions or behaviors under one single title • In mathematics, the spectrum of a (finite-dimensional) matrix is the set of its eigenvalues. • Nystrom algorithms: approximations to eigenvalues – FASTMAP: linear time
  • 9. Project data on first 2 PCA; grid that data e.g. Nasa93dem 1) project 23 dimensions projected into 2 2a) cluster 2b) replace clusters with centroids. MOEA: score= effort+defects +months
  • 10. Sanity check: What information loss? • E.g. POI-3 – 400+ examples – 20 centroids • Prediction via: – Extrapolation between two nearest centroids • Works as well as – Random forest, Naïve Bayes • For defect prediction (10 data sets) – Linear regression, M5’ • For effort estimation (10 data sets)
  • 11. • Find delta between neighbors that go worse to better • Very small rules, found in logLinear time • Menzies et al. [TSE’13] 11 Planning = Inter-cluster contrast sets
  • 12. Applications • Prediction • Planning • Monitoring • Multi-objective optimization – Cluster first on N objectives • Anomaly detection • Incremental theory revision • Compression • Privacy • etc
  • 13. Idea Engineering 0. algorithm mining 1. landscape mining 2. decision mining 3. discussion mining yesterday today tomorrow future Beyond Data Mining, T. Menzies, IEEE Software, 2013, to appear 13 Q: why call it mining? • A1: because all the primitives for the above are in the data mining literature • So we know how to get from here to there • A2: because data mining scales