SlideShare ist ein Scribd-Unternehmen logo
1 von 14
Downloaden Sie, um offline zu lesen
Layers


                         An Adaptive Filter-Framework for the
                         Quality Improvement of Open-Source
                                   Software Analysis

                            Advanced Community Information Systems (ACIS)
                                  RWTH Aachen University, Germany
                              Anna Hannemann, Michael Hackstein, Ralf
                                       Klamma, Matthias Jarke
Lehrstuhl Informatik 5
(Information Systems)
   Prof. Dr. M. Jarke
          1                  This slide deck is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.
Open Source Software Projects

     Layers                Community-driven    Development
                           Voluntary participation
                           Communication, project management and
                            development via Web tools
                           Some successful and famous examples
                           Smaller niche projects
                           A long-tail of unsuccessful projects


Lehrstuhl Informatik 5
(Information Systems)
   Prof. Dr. M. Jarke
          2
Open Source Software Analysis for
                                Software Engineering

     Layers                Understand,  model, simulate and organize
                            community-driven development
                           Agile development practices
                           Distributed and intercultural practices
                           New success factors
                           Long-term freely available datasets
                           Low cost empirical studies


Lehrstuhl Informatik 5
(Information Systems)
   Prof. Dr. M. Jarke
          3
Open Source Software Analysis
                               Research Results

     Layers




Lehrstuhl Informatik 5
(Information Systems)
   Prof. Dr. M. Jarke
          4                       Scacchi, “The Future Research in Free/Open Source Software Development”, 2010
Techniques for Knowledge Mining in
                             Development Repositories

     Layers




                           Results
                                 are only as good as data is!
                           Remember DNA Phantom?
                            “A hypothesized unknown female serial killer as a result of
                            contaminated cotton swabs used for collecting DNA”
                           MineData not Noise!
                            Cleaning of Artifacts from Communication and
Lehrstuhl Informatik 5
(Information Systems)
   Prof. Dr. M. Jarke
                                 Development Repositories Needed
          5
Data Cleaning for Knowledge Mining
                              in Development Repositories

     Layers

                           Data-structure   independence: variable artifacts types
                           Additive filtering: filter only new data
                           Filter nesting: sequence of arbitrary order
                           Consistent data format: cross-medium analysis
                           Consistent and easy-to-use interface
                           Extensibility: continuous evolution
                           Adaptive database insertion
Lehrstuhl Informatik 5
(Information Systems)
   Prof. Dr. M. Jarke
          6
Adaptive-Filtering Approach
                                   Cross-Media Mapping

     Layers
                         Artifact types
                           Mail
                           Comment
                           Post
                           ...
                         Cross-media mapping
                           Assignment of semantic meaning to artifact elements
                           Extensibility to new data sources
Lehrstuhl Informatik 5
(Information Systems)      Same filters for different data
   Prof. Dr. M. Jarke
          7
Adaptive-Filtering Approach
                                       Filter Nesting

     Layers                Sequence     of filters F1, F2, …, FN
                           Results in same predefined format
                           One filter – one cleaning (analysis) task
                           Each filter triggers its predecessor
                           Complex filter as a combination of several filters
                           Filtering triggered on demand
                           Filtering of a subset possible
                           Simple filters first and than analysis of reduced data
Lehrstuhl Informatik 5
(Information Systems)
                            set with more filters of higher complexity
   Prof. Dr. M. Jarke
          8
Adaptive-Filtering Approach
                                    Multi-Threading

     Layers




                           Only  new data is filtered
                           Asynchronous processing: filtered data subset is
                            provided directly to the next analysis task
Lehrstuhl Informatik 5
                           Synchronous processing: wait till the complete data
                            set is filtered
(Information Systems)
   Prof. Dr. M. Jarke
          9
Dataset Reduction and Content
                                     Cleaning Filters

     Layers                Dataset   Reduction Filter (DRF)
                            –  Reduces amount of artifacts
                            –  Select artifacts, which fulfill certain criteria
                            –  Example
                                –  Spam detection
                                –  Artifact classification based on Bayes Decision Rule
                           Content   Cleaning Filter (CRF)
                            –  Modifies content of artifacts
                            –  Example
Lehrstuhl Informatik 5
                                –  Quotation Filter
(Information Systems)
   Prof. Dr. M. Jarke
         10
                                –  Detection of predefined patterns in content
Artifact Transformation Filters

     Layers                Filter
                                 as analysis task
                           Modifies artifact attributes
                           Example:
                            –  Core-Periphery Filter: Separates
                               core of community from periphery
                            –  Hierarchical clustering based on
                               power law distribution


Lehrstuhl Informatik 5
(Information Systems)
   Prof. Dr. M. Jarke
         11
Validation in BioJava, Biopython and
                             BioPerl OSS: Spam Detection

     Layers
                             BioJava




                         Spam and spammer level in mailing lists of OSS
                           Significant amount (up to 60%)
                           Non-monoton
                           Distortion of dynamics
Lehrstuhl Informatik 5
(Information Systems)
   Prof. Dr. M. Jarke
         12
Validation in BioJava, Biopython and
                            BioPerl OSS: Results Distortion

     Layers




                                                 Year 2004, BioJava


                         Mood within project community
                           Summarized sentiment of project Mails per month
                           Positive sentiment of spam advertisement
                           Incorrect sentiment assignment due to quotation
Lehrstuhl Informatik 5
(Information Systems)
   Prof. Dr. M. Jarke
         13
Adaptive Filter-Framework and OSS
                                               Analysis
                           OSS Analysis for SE
     Layers
                            –  Methods/metrics for knowledge mining in company
                               communication and development repositories
                            –  Understanding of community-oriented development:
                               principles, obstacles and advantages
                         !  Data Cleaning: Results are only as good as data is!
                           Adaptive   Filter-Framework
                            –  Significant noise level in data
                            –  Adaptable for any Web artifact format
Lehrstuhl Informatik 5
                            –  Filter nesting
(Information Systems)
   Prof. Dr. M. Jarke
         14                 –  Filter as analysis method

Weitere ähnliche Inhalte

Was ist angesagt?

download
downloaddownload
download
butest
 
kantorNSF-NIJ-ISI-03-06-04.ppt
kantorNSF-NIJ-ISI-03-06-04.pptkantorNSF-NIJ-ISI-03-06-04.ppt
kantorNSF-NIJ-ISI-03-06-04.ppt
butest
 
Using ontologies to do integrative systems biology
Using ontologies to do integrative systems biologyUsing ontologies to do integrative systems biology
Using ontologies to do integrative systems biology
Chris Evelo
 
Poster Semantic data integration proof of concept
Poster Semantic data integration proof of conceptPoster Semantic data integration proof of concept
Poster Semantic data integration proof of concept
Nicolas Bertrand
 
Experiences in building an ontology driven image database for ...
Experiences in building an ontology driven image database for ...Experiences in building an ontology driven image database for ...
Experiences in building an ontology driven image database for ...
Carla Lima
 
Artista a network for ar tifical immune sys tems
Artista a network for ar tifical immune sys temsArtista a network for ar tifical immune sys tems
Artista a network for ar tifical immune sys tems
UltraUploader
 
Computational Biology Methods for Drug Discovery_Phase 1-5_November 2015
Computational Biology Methods for Drug Discovery_Phase 1-5_November 2015Computational Biology Methods for Drug Discovery_Phase 1-5_November 2015
Computational Biology Methods for Drug Discovery_Phase 1-5_November 2015
Mathew Varghese
 
Metabolomics Society meeting 2011 - presentatie Kees
Metabolomics Society meeting 2011 - presentatie KeesMetabolomics Society meeting 2011 - presentatie Kees
Metabolomics Society meeting 2011 - presentatie Kees
thehyve
 
NatashaBME1450.doc
NatashaBME1450.docNatashaBME1450.doc
NatashaBME1450.doc
butest
 

Was ist angesagt? (20)

download
downloaddownload
download
 
Opening up pharmacological space, the OPEN PHACTs api
Opening up pharmacological space, the OPEN PHACTs apiOpening up pharmacological space, the OPEN PHACTs api
Opening up pharmacological space, the OPEN PHACTs api
 
kantorNSF-NIJ-ISI-03-06-04.ppt
kantorNSF-NIJ-ISI-03-06-04.pptkantorNSF-NIJ-ISI-03-06-04.ppt
kantorNSF-NIJ-ISI-03-06-04.ppt
 
CSHALS 2013
CSHALS 2013CSHALS 2013
CSHALS 2013
 
MICROARRAY GENE EXPRESSION ANALYSIS USING TYPE 2 FUZZY LOGIC(MGA-FL)
MICROARRAY GENE EXPRESSION ANALYSIS USING TYPE 2 FUZZY LOGIC(MGA-FL)MICROARRAY GENE EXPRESSION ANALYSIS USING TYPE 2 FUZZY LOGIC(MGA-FL)
MICROARRAY GENE EXPRESSION ANALYSIS USING TYPE 2 FUZZY LOGIC(MGA-FL)
 
Semantic Technology empowering Real World outcomes in Biomedical Research and...
Semantic Technology empowering Real World outcomes in Biomedical Research and...Semantic Technology empowering Real World outcomes in Biomedical Research and...
Semantic Technology empowering Real World outcomes in Biomedical Research and...
 
Using ontologies to do integrative systems biology
Using ontologies to do integrative systems biologyUsing ontologies to do integrative systems biology
Using ontologies to do integrative systems biology
 
Poster Semantic data integration proof of concept
Poster Semantic data integration proof of conceptPoster Semantic data integration proof of concept
Poster Semantic data integration proof of concept
 
Analysis with biological pathways:
Analysis with biological pathways: Analysis with biological pathways:
Analysis with biological pathways:
 
Experiences in building an ontology driven image database for ...
Experiences in building an ontology driven image database for ...Experiences in building an ontology driven image database for ...
Experiences in building an ontology driven image database for ...
 
Task-Specific Query Expansion for Genomics (MultiText Experiments for TREC 2003)
Task-Specific Query Expansion for Genomics (MultiText Experiments for TREC 2003)Task-Specific Query Expansion for Genomics (MultiText Experiments for TREC 2003)
Task-Specific Query Expansion for Genomics (MultiText Experiments for TREC 2003)
 
A Survey on Bioinformatics Tools
A Survey on Bioinformatics ToolsA Survey on Bioinformatics Tools
A Survey on Bioinformatics Tools
 
Artista a network for ar tifical immune sys tems
Artista a network for ar tifical immune sys temsArtista a network for ar tifical immune sys tems
Artista a network for ar tifical immune sys tems
 
The Crop Ontology - Harmonizing Semantics for Agricultural Field Data, by Eli...
The Crop Ontology - Harmonizing Semantics for Agricultural Field Data, by Eli...The Crop Ontology - Harmonizing Semantics for Agricultural Field Data, by Eli...
The Crop Ontology - Harmonizing Semantics for Agricultural Field Data, by Eli...
 
Closing the Gap in Time: From Raw Data to Real Science
Closing the Gap in Time: From Raw Data to Real ScienceClosing the Gap in Time: From Raw Data to Real Science
Closing the Gap in Time: From Raw Data to Real Science
 
Computational Biology Methods for Drug Discovery_Phase 1-5_November 2015
Computational Biology Methods for Drug Discovery_Phase 1-5_November 2015Computational Biology Methods for Drug Discovery_Phase 1-5_November 2015
Computational Biology Methods for Drug Discovery_Phase 1-5_November 2015
 
Metabolomics Society meeting 2011 - presentatie Kees
Metabolomics Society meeting 2011 - presentatie KeesMetabolomics Society meeting 2011 - presentatie Kees
Metabolomics Society meeting 2011 - presentatie Kees
 
Use of data
Use of dataUse of data
Use of data
 
NatashaBME1450.doc
NatashaBME1450.docNatashaBME1450.doc
NatashaBME1450.doc
 
OpenTox Europe 2013
OpenTox Europe 2013OpenTox Europe 2013
OpenTox Europe 2013
 

Ähnlich wie An Adaptive Filter-Framework for the Quality Improvement of Open-Source Software Analysis

Researcher Profiling based on Semantic Analysis in Social Networks
Researcher Profiling based on Semantic Analysis in Social NetworksResearcher Profiling based on Semantic Analysis in Social Networks
Researcher Profiling based on Semantic Analysis in Social Networks
Laurens De Vocht
 
2013 nas-ehs-data-integration-dc
2013 nas-ehs-data-integration-dc2013 nas-ehs-data-integration-dc
2013 nas-ehs-data-integration-dc
c.titus.brown
 
What's mine is yours (and vice versa) Data sharing in vibrational spectroscopy
What's mine is yours (and vice versa) Data sharing in vibrational spectroscopyWhat's mine is yours (and vice versa) Data sharing in vibrational spectroscopy
What's mine is yours (and vice versa) Data sharing in vibrational spectroscopy
Alex Henderson
 

Ähnlich wie An Adaptive Filter-Framework for the Quality Improvement of Open-Source Software Analysis (20)

An Embeddable Dashboard for Widget-Based Visual Analytics on Scientific Commu...
An Embeddable Dashboard for Widget-Based Visual Analytics on Scientific Commu...An Embeddable Dashboard for Widget-Based Visual Analytics on Scientific Commu...
An Embeddable Dashboard for Widget-Based Visual Analytics on Scientific Commu...
 
Enhancing Academic Event Participation with Context-aware and Social Recommen...
Enhancing Academic Event Participation with Context-aware and Social Recommen...Enhancing Academic Event Participation with Context-aware and Social Recommen...
Enhancing Academic Event Participation with Context-aware and Social Recommen...
 
Researcher Profiling based on Semantic Analysis in Social Networks
Researcher Profiling based on Semantic Analysis in Social NetworksResearcher Profiling based on Semantic Analysis in Social Networks
Researcher Profiling based on Semantic Analysis in Social Networks
 
Genevestigator
GenevestigatorGenevestigator
Genevestigator
 
BITS - Genevestigator to easily access transcriptomics data
BITS - Genevestigator to easily access transcriptomics dataBITS - Genevestigator to easily access transcriptomics data
BITS - Genevestigator to easily access transcriptomics data
 
Big data from small data: A deep survey of the neuroscience landscape data via
Big data from small data:  A deep survey of the neuroscience landscape data viaBig data from small data:  A deep survey of the neuroscience landscape data via
Big data from small data: A deep survey of the neuroscience landscape data via
 
Identification of Learning Goals in Forum-based Communities
Identification of Learning Goals in Forum-based CommunitiesIdentification of Learning Goals in Forum-based Communities
Identification of Learning Goals in Forum-based Communities
 
Interactions for Learning as Expressed in an IMS LD Runtime Environment
Interactions for Learning as Expressed in an IMS LD Runtime EnvironmentInteractions for Learning as Expressed in an IMS LD Runtime Environment
Interactions for Learning as Expressed in an IMS LD Runtime Environment
 
Containerized attribute indexing and graph genomes for federated data access
Containerized attribute indexing and graph genomes for federated data accessContainerized attribute indexing and graph genomes for federated data access
Containerized attribute indexing and graph genomes for federated data access
 
2013 nas-ehs-data-integration-dc
2013 nas-ehs-data-integration-dc2013 nas-ehs-data-integration-dc
2013 nas-ehs-data-integration-dc
 
NetLearn: Social Network Analysis and Visualizations for Learning
NetLearn: Social Network Analysis and Visualizations for LearningNetLearn: Social Network Analysis and Visualizations for Learning
NetLearn: Social Network Analysis and Visualizations for Learning
 
What's mine is yours (and vice versa) Data sharing in vibrational spectroscopy
What's mine is yours (and vice versa) Data sharing in vibrational spectroscopyWhat's mine is yours (and vice versa) Data sharing in vibrational spectroscopy
What's mine is yours (and vice versa) Data sharing in vibrational spectroscopy
 
Open PHACTS for BDE SC1.1
Open PHACTS for BDE SC1.1Open PHACTS for BDE SC1.1
Open PHACTS for BDE SC1.1
 
Data Provenance and Scientific Workflow Management
Data Provenance and Scientific Workflow ManagementData Provenance and Scientific Workflow Management
Data Provenance and Scientific Workflow Management
 
FAIR Data, Operations and Model management for Systems Biology and Systems Me...
FAIR Data, Operations and Model management for Systems Biology and Systems Me...FAIR Data, Operations and Model management for Systems Biology and Systems Me...
FAIR Data, Operations and Model management for Systems Biology and Systems Me...
 
Addressing privacy concerns_in_the_age_of_federated_data_access
Addressing privacy concerns_in_the_age_of_federated_data_accessAddressing privacy concerns_in_the_age_of_federated_data_access
Addressing privacy concerns_in_the_age_of_federated_data_access
 
Laskaris mining information_neuroinformatics
Laskaris mining information_neuroinformaticsLaskaris mining information_neuroinformatics
Laskaris mining information_neuroinformatics
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
The real world of ontologies and phenotype representation: perspectives from...
The real world of ontologies and phenotype representation:  perspectives from...The real world of ontologies and phenotype representation:  perspectives from...
The real world of ontologies and phenotype representation: perspectives from...
 
Learning Analytics for the Lifelong Long Tail Learner
Learning Analytics for the Lifelong Long Tail LearnerLearning Analytics for the Lifelong Long Tail Learner
Learning Analytics for the Lifelong Long Tail Learner
 

Kürzlich hochgeladen

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 

Kürzlich hochgeladen (20)

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 

An Adaptive Filter-Framework for the Quality Improvement of Open-Source Software Analysis

  • 1. Layers An Adaptive Filter-Framework for the Quality Improvement of Open-Source Software Analysis Advanced Community Information Systems (ACIS) RWTH Aachen University, Germany Anna Hannemann, Michael Hackstein, Ralf Klamma, Matthias Jarke Lehrstuhl Informatik 5 (Information Systems) Prof. Dr. M. Jarke 1 This slide deck is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.
  • 2. Open Source Software Projects Layers   Community-driven Development   Voluntary participation   Communication, project management and development via Web tools   Some successful and famous examples   Smaller niche projects   A long-tail of unsuccessful projects Lehrstuhl Informatik 5 (Information Systems) Prof. Dr. M. Jarke 2
  • 3. Open Source Software Analysis for Software Engineering Layers   Understand, model, simulate and organize community-driven development   Agile development practices   Distributed and intercultural practices   New success factors   Long-term freely available datasets   Low cost empirical studies Lehrstuhl Informatik 5 (Information Systems) Prof. Dr. M. Jarke 3
  • 4. Open Source Software Analysis Research Results Layers Lehrstuhl Informatik 5 (Information Systems) Prof. Dr. M. Jarke 4 Scacchi, “The Future Research in Free/Open Source Software Development”, 2010
  • 5. Techniques for Knowledge Mining in Development Repositories Layers   Results are only as good as data is!   Remember DNA Phantom? “A hypothesized unknown female serial killer as a result of contaminated cotton swabs used for collecting DNA”   MineData not Noise! Cleaning of Artifacts from Communication and Lehrstuhl Informatik 5 (Information Systems) Prof. Dr. M. Jarke Development Repositories Needed 5
  • 6. Data Cleaning for Knowledge Mining in Development Repositories Layers   Data-structure independence: variable artifacts types   Additive filtering: filter only new data   Filter nesting: sequence of arbitrary order   Consistent data format: cross-medium analysis   Consistent and easy-to-use interface   Extensibility: continuous evolution   Adaptive database insertion Lehrstuhl Informatik 5 (Information Systems) Prof. Dr. M. Jarke 6
  • 7. Adaptive-Filtering Approach Cross-Media Mapping Layers Artifact types   Mail   Comment   Post   ... Cross-media mapping   Assignment of semantic meaning to artifact elements   Extensibility to new data sources Lehrstuhl Informatik 5 (Information Systems)   Same filters for different data Prof. Dr. M. Jarke 7
  • 8. Adaptive-Filtering Approach Filter Nesting Layers   Sequence of filters F1, F2, …, FN   Results in same predefined format   One filter – one cleaning (analysis) task   Each filter triggers its predecessor   Complex filter as a combination of several filters   Filtering triggered on demand   Filtering of a subset possible   Simple filters first and than analysis of reduced data Lehrstuhl Informatik 5 (Information Systems) set with more filters of higher complexity Prof. Dr. M. Jarke 8
  • 9. Adaptive-Filtering Approach Multi-Threading Layers   Only new data is filtered   Asynchronous processing: filtered data subset is provided directly to the next analysis task Lehrstuhl Informatik 5   Synchronous processing: wait till the complete data set is filtered (Information Systems) Prof. Dr. M. Jarke 9
  • 10. Dataset Reduction and Content Cleaning Filters Layers   Dataset Reduction Filter (DRF) –  Reduces amount of artifacts –  Select artifacts, which fulfill certain criteria –  Example –  Spam detection –  Artifact classification based on Bayes Decision Rule   Content Cleaning Filter (CRF) –  Modifies content of artifacts –  Example Lehrstuhl Informatik 5 –  Quotation Filter (Information Systems) Prof. Dr. M. Jarke 10 –  Detection of predefined patterns in content
  • 11. Artifact Transformation Filters Layers   Filter as analysis task   Modifies artifact attributes   Example: –  Core-Periphery Filter: Separates core of community from periphery –  Hierarchical clustering based on power law distribution Lehrstuhl Informatik 5 (Information Systems) Prof. Dr. M. Jarke 11
  • 12. Validation in BioJava, Biopython and BioPerl OSS: Spam Detection Layers BioJava Spam and spammer level in mailing lists of OSS   Significant amount (up to 60%)   Non-monoton   Distortion of dynamics Lehrstuhl Informatik 5 (Information Systems) Prof. Dr. M. Jarke 12
  • 13. Validation in BioJava, Biopython and BioPerl OSS: Results Distortion Layers Year 2004, BioJava Mood within project community   Summarized sentiment of project Mails per month   Positive sentiment of spam advertisement   Incorrect sentiment assignment due to quotation Lehrstuhl Informatik 5 (Information Systems) Prof. Dr. M. Jarke 13
  • 14. Adaptive Filter-Framework and OSS Analysis   OSS Analysis for SE Layers –  Methods/metrics for knowledge mining in company communication and development repositories –  Understanding of community-oriented development: principles, obstacles and advantages !  Data Cleaning: Results are only as good as data is!   Adaptive Filter-Framework –  Significant noise level in data –  Adaptable for any Web artifact format Lehrstuhl Informatik 5 –  Filter nesting (Information Systems) Prof. Dr. M. Jarke 14 –  Filter as analysis method