Idea Engineering

•Als PPTX, PDF herunterladen•

1 gefällt mir•819 views

CS, NcState

Technologie

Idea Engineering
tim@menzies.us
PROMISE’13
Oct’13
0. algorithm
mining
1. landscape
mining
2. decision
mining
3. discussion
mining
yesterday today
tomorrow future

The Premises of PROMISE
(2005)
– Wanted: predictions
• Nope. Users want decision, or engagement

The Premises of PROMISE
(2005)
– Wanted: predictions
• Nope. Users want decision, or engagement
– Data mining will reveal “the truth” about SE
• [Dejaeger: TSE’11], [Hall: TSE’12], [Shepperd:COW’13]
• Not(Better learners = better conclusions)

Landscape mining:
look before your leap
• Report what is true about the
data
– Not trivia on how algorithms
walk that data
• Map the landscape
– Reason on each part of map
• E.g. landscape mining
– Unsupervised iterative
dichotomization
– Cluster, prune
– Then generate rules
5

Find landscape = cluster data, assign “heights”
Find decisions = report delta highs to lows
Monitor discussions = watch, help, communities explore deltas
7
IDEA Engineering = <landscape, decisions, discussion>

Spectral Landscape Mining
• Spectrum = condition that is not
limited to a specific set of values
but varies in a continuum.
• Groups together a broad range of
conditions or behaviors under
one single title
• In mathematics, the spectrum of
a (finite-dimensional) matrix is
the set of its eigenvalues.
• Nystrom algorithms:
approximations to eigenvalues
– FASTMAP: linear time

Project data on first 2 PCA; grid that data
e.g. Nasa93dem
1) project 23 dimensions projected into 2
2a) cluster
2b) replace clusters with centroids.
MOEA: score=
effort+defects
+months

Sanity check:
What information loss?
• E.g. POI-3
– 400+ examples
– 20 centroids
• Prediction via:
– Extrapolation between two
nearest centroids
• Works as well as
– Random forest, Naïve Bayes
• For defect prediction (10 data sets)
– Linear regression, M5’
• For effort estimation (10 data sets)

• Find delta between neighbors that go worse to better
• Very small rules, found in logLinear time
• Menzies et al. [TSE’13]
11
Planning = Inter-cluster contrast sets

Applications
• Prediction
• Planning
• Monitoring
• Multi-objective optimization
– Cluster first on N objectives
• Anomaly detection
• Incremental theory revision
• Compression
• Privacy
• etc

Idea Engineering
0. algorithm
mining
1. landscape
mining
2. decision
mining
3. discussion
mining
yesterday today
tomorrow future
Beyond Data Mining, T. Menzies, IEEE Software, 2013, to appear
13
Q: why call it
mining?
• A1: because all the primitives for the above are
in the data mining literature
• So we know how to get from here to there
• A2: because data mining scales

Weitere ähnliche Inhalte

Ähnlich wie Idea Engineering

Icse15 Tech-briefing Data ScienceCS, NcState

What Metrics Matter? CS, NcState

Machine Learning Summary for Caltech2Lukas Mandrake

MUMS: Transition & SPUQ Workshop - Uncertainty Quantification in Materials Wo...The Statistical and Applied Mathematical Sciences Institute

Local vs. Global Models for Effort Estimation and Defect Prediction CS, NcState

LSST Solar System Science: MOPS Status, the Science, and Your QuestionsMario Juric

Ml pluss ejan2013CS, NcState

Tim Menzies, directions in Data ScienceCS, NcState

Scalable and Efficient Algorithms for Analysis of Massive, Streaming GraphsJason Riedy

Program on Mathematical and Statistical Methods for Climate and the Earth Sys...The Statistical and Applied Mathematical Sciences Institute

Franhouder july2013CS, NcState

Licentiate Defense SlideRerngvit Yanggratoke

Hyperoptimized Machine Learning and Deep Learning Methods For Geospatial and ...Neelabha Pant

Clustering large-scale data Buzzwords 2013 fullDan-George Filimon

Sensors1(1)Lakmal Pathirana

Clustering - ACM 2013 02-25MapR Technologies

Topological Data Analysis of Complex Spatial SystemsMason Porter

Lecture2-DT.pptxINyomanSwitrayana

social.pptxNISHASOMSCS113

Final_Talk_Tool_TeamMehdi Lamee

Ähnlich wie Idea Engineering (20)

Icse15 Tech-briefing Data Science

What Metrics Matter?

Machine Learning Summary for Caltech2

MUMS: Transition & SPUQ Workshop - Uncertainty Quantification in Materials Wo...

Local vs. Global Models for Effort Estimation and Defect Prediction

LSST Solar System Science: MOPS Status, the Science, and Your Questions

Ml pluss ejan2013

Tim Menzies, directions in Data Science

Scalable and Efficient Algorithms for Analysis of Massive, Streaming Graphs

Program on Mathematical and Statistical Methods for Climate and the Earth Sys...

Franhouder july2013

Licentiate Defense Slide

Hyperoptimized Machine Learning and Deep Learning Methods For Geospatial and ...

Clustering large-scale data Buzzwords 2013 full

Sensors1(1)

Clustering - ACM 2013 02-25

Topological Data Analysis of Complex Spatial Systems

Lecture2-DT.pptx

social.pptx

Final_Talk_Tool_Team

Mehr von CS, NcState

Future se oct15CS, NcState

GALE: Geometric active learning for Search-Based Software EngineeringCS, NcState

Three Laws of Trusted Data Sharing:(Building a Better Business Case for Dat...CS, NcState

Lexisnexis june9CS, NcState

Kits to Find the Bits that Fits CS, NcState

Ai4se lab templateCS, NcState

Automated Software Enging, Fall 2015, NCSUCS, NcState

Requirements EngineeringCS, NcState

172529main ken and_tim_software_assurance_research_at_west_virginiaCS, NcState

Automated Software EngineeringCS, NcState

GoldrushCS, NcState

Dagstuhl14 intro-v1CS, NcState

Know thy toolsCS, NcState

The Art and Science of Analyzing Software DataCS, NcState

Sayyad slides ase13_v4CS, NcState

Ase2013CS, NcState

Warning: don't do CSCS, NcState

How to do better experiments in SECS, NcState

Icse 2013-tutorial-data-science-for-software-engineeringCS, NcState

Dm sei-tutorial-v7CS, NcState

Mehr von CS, NcState (20)

Future se oct15

GALE: Geometric active learning for Search-Based Software Engineering

Three Laws of Trusted Data Sharing:(Building a Better Business Case for Dat...

Lexisnexis june9

Kits to Find the Bits that Fits

Ai4se lab template

Automated Software Enging, Fall 2015, NCSU

Requirements Engineering

172529main ken and_tim_software_assurance_research_at_west_virginia

Automated Software Engineering

Goldrush

Dagstuhl14 intro-v1

Know thy tools

The Art and Science of Analyzing Software Data

Sayyad slides ase13_v4

Ase2013

Warning: don't do CS

How to do better experiments in SE

Icse 2013-tutorial-data-science-for-software-engineering

Dm sei-tutorial-v7

Kürzlich hochgeladen

So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda

Data governance with Unity Catalog PresentationKnoldus Inc.

Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada

UiPath Community: Communication Mining from Zero to HeroUiPathCommunity

Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3

Time Series Foundation Models - current state and future directionsNathaniel Shimoni

Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Nikki Chapple

Decarbonising Buildings: Making a net-zero built environment a realityIES VE

The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3

[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra

Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...itnewsafrica

Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein

Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal

QCon London: Mastering long-running processes in modern architecturesBernd Ruecker

Scale your database traffic with Read & Write split using MySQL RouterMydbops

Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda

Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani

Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3

The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech

Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integrationmarketing932765

Kürzlich hochgeladen (20)

So einfach geht modernes Roaming fuer Notes und Nomad.pdf

Data governance with Unity Catalog Presentation

Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024

UiPath Community: Communication Mining from Zero to Hero

Moving Beyond Passwords: FIDO Paris Seminar.pdf

Time Series Foundation Models - current state and future directions

Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...

Decarbonising Buildings: Making a net-zero built environment a reality

The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx

[Webinar] SpiraTest - Setting New Standards in Quality Assurance

Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...

Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24

Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...

QCon London: Mastering long-running processes in modern architectures

Scale your database traffic with Read & Write split using MySQL Router

Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger

Potential of AI (Generative AI) in Business: Learnings and Insights

Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx

The Ultimate Guide to Choosing WordPress Pros and Cons

Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration

Idea Engineering

1. Idea Engineering tim@menzies.us PROMISE’13 Oct’13 0. algorithm mining 1. landscape mining 2. decision mining 3. discussion mining yesterday today tomorrow future

2. The Premises of PROMISE (2005) – Wanted: predictions • Nope. Users want decision, or engagement

3. The Premises of PROMISE (2005) – Wanted: predictions • Nope. Users want decision, or engagement – Data mining will reveal “the truth” about SE • [Dejaeger: TSE’11], [Hall: TSE’12], [Shepperd:COW’13] • Not(Better learners = better conclusions)

4. The Premises of PROMISE (2005) – Wanted: predictions • Nope. Users want decision, or engagement – Data mining will reveal “the truth” about SE • [Dejaeger: TSE’11], [Hall: TSE’12], [Shepperd:COW’13] • Not(Better learners = better conclusions) – Sooner or later: enough data for general conclusions • Found more differences than generalities • Special issues: [IST’13], [ESEj’13] • Best papers, ASE’11, MSR’12 • Menzies, Zimmermann et al [TSE’13] • Lots of local models

5. Landscape mining: look before your leap • Report what is true about the data – Not trivia on how algorithms walk that data • Map the landscape – Reason on each part of map • E.g. landscape mining – Unsupervised iterative dichotomization – Cluster, prune – Then generate rules 5

6. Landscape mining: look before your leap • Report what is true about the data – Not trivia on how algorithms walk that data • Map the landscape – Reason on each part of map • E.g. landscape mining – Unsupervised iterative dichotomization – Cluster, prune – Then generate rules • Different to “leap before you look” – i.e. skew learning by class variable – then study the results • E.g. C4.5, CART, Fayya-Iranni, etc – Supervised iterative dichotomization • E.g. 61% * 300+effort estimation papers – Algorithm tinkering, without end 6

7. Find landscape = cluster data, assign “heights” Find decisions = report delta highs to lows Monitor discussions = watch, help, communities explore deltas 7 IDEA Engineering = <landscape, decisions, discussion>

8. Spectral Landscape Mining • Spectrum = condition that is not limited to a specific set of values but varies in a continuum. • Groups together a broad range of conditions or behaviors under one single title • In mathematics, the spectrum of a (finite-dimensional) matrix is the set of its eigenvalues. • Nystrom algorithms: approximations to eigenvalues – FASTMAP: linear time

9. Project data on first 2 PCA; grid that data e.g. Nasa93dem 1) project 23 dimensions projected into 2 2a) cluster 2b) replace clusters with centroids. MOEA: score= effort+defects +months

10. Sanity check: What information loss? • E.g. POI-3 – 400+ examples – 20 centroids • Prediction via: – Extrapolation between two nearest centroids • Works as well as – Random forest, Naïve Bayes • For defect prediction (10 data sets) – Linear regression, M5’ • For effort estimation (10 data sets)

11. • Find delta between neighbors that go worse to better • Very small rules, found in logLinear time • Menzies et al. [TSE’13] 11 Planning = Inter-cluster contrast sets

12. Applications • Prediction • Planning • Monitoring • Multi-objective optimization – Cluster first on N objectives • Anomaly detection • Incremental theory revision • Compression • Privacy • etc

13. Idea Engineering 0. algorithm mining 1. landscape mining 2. decision mining 3. discussion mining yesterday today tomorrow future Beyond Data Mining, T. Menzies, IEEE Software, 2013, to appear 13 Q: why call it mining? • A1: because all the primitives for the above are in the data mining literature • So we know how to get from here to there • A2: because data mining scales

Idea Engineering

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Ähnlich wie Idea Engineering

Ähnlich wie Idea Engineering (20)

Mehr von CS, NcState

Mehr von CS, NcState (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Idea Engineering