SlideShare ist ein Scribd-Unternehmen logo
1 von 89
Downloaden Sie, um offline zu lesen
http://people.disim.univaq.it/diruscio/
davide.diruscio@univaq.it
@ddiruscio
Dipartimento di Ingegneria e Scienze
Università degli Studi dell’Aquila
dell’Informazione e Matematica
Developing recommendation systems to
support open-source software developers:
challenges and lessons learned
Davide Di Ruscio
2
Who am I?
http://people.disim.univaq.it/diruscio/
3
Development of complex software systems by reusing
third-party open source components
Recommendation systems in Software Engineering
4
5
https://www.slideshare.net/CrossingMinds/recommendation-system-explained?from_action=save
6
Problem domain
Recommendation systems (RS) help to match users with items
– Ease information overload
– Sales assistance (guidance, advisory, persuasion,…)
Different system designs / paradigms
– Based on availability of exploitable data
– Implicit and explicit user feedback
– Domain characteristics
RS are software agents that elicit the interests and preferences of individual consumers
[…] and make recommendations accordingly. They have the potential to support and
improve the quality of the decision's consumers make while searching for and selecting
products online.
[Xiao & Benbasat, MISQ, 2007]
http://clgiles.ist.psu.edu/IST441/materials/powerpoint/RC/rec.pptx
7
Recommender systems
RS seen as a function
Given:
– User model (e.g. ratings, preferences, demographics, situational context)
– Items (with or without description of item characteristics)
Find:
– Relevance score. Used for ranking.
Finally:
– Recommend items that are assumed to be relevant
http://clgiles.ist.psu.edu/IST441/materials/powerpoint/RC/rec.pptx
8
Recommender systems
RS seen as a function
Given:
– User model (e.g. ratings, preferences, demographics, situational context)
– Items (with or without description of item characteristics)
Find:
– Relevance score. Used for ranking.
Finally:
– Recommend items that are assumed to be relevant
But:
• Remember that relevance
might be context-dependent
• Characteristics of the list itself
might be important (diversity)
http://clgiles.ist.psu.edu/IST441/materials/powerpoint/RC/rec.pptx
9
Paradigms of recommender systems
Recommender systems reduce
information overload by estimating
relevance
http://clgiles.ist.psu.edu/IST441/materials/powerpoint/RC/rec.pptx
10
Paradigms of recommender systems
Personalized recommendations
http://clgiles.ist.psu.edu/IST441/materials/powerpoint/RC/rec.pptx
11
Paradigms of recommender systems
Collaborative: "Tell me what's popular
among my peers"
http://clgiles.ist.psu.edu/IST441/materials/powerpoint/RC/rec.pptx
12
Paradigms of recommender systems
Content-based: "Show me more of the
same what I've liked"
http://clgiles.ist.psu.edu/IST441/materials/powerpoint/RC/rec.pptx
13
Paradigms of recommender systems
Knowledge-based: "Tell me what fits
based on my needs"
http://clgiles.ist.psu.edu/IST441/materials/powerpoint/RC/rec.pptx
14
Paradigms of recommender systems
Hybrid: combinations of various inputs
and/or composition of different
mechanism
http://clgiles.ist.psu.edu/IST441/materials/powerpoint/RC/rec.pptx
15
Collaborative-Filtering Technique
■ Providing users with
suggestions that fit
their taste/need
16
Collaborative-Filtering Technique
■ Providing users with
suggestions that fit
their taste/need
■ Being based on the
preferences of similar
users
17
Collaborative-Filtering Technique
■ Providing users with
suggestions that fit
their taste/need
■ Being based on the
preferences of similar
users
18
Collaborative-Filtering Technique
■ Providing users with
suggestions that fit
their taste/need
■ Being based on the
preferences of similar
users
19
Collaborative-Filtering Technique
■ Providing users with
suggestions that fit
their taste/need
■ Being based on the
preferences of similar
users
20
Collaborative-Filtering Technique
20
R1 R2 R3
c1 5 5 2
c2 3 3 4
c3 5 5 ?
Internal Meeting, 31 October 2017
User-item matrix: Ratings given to Pizza restaurants by customers
21
Recommendation Systems in Software Engineering
A recommendation system in software
engineering is
“. . . a software application that provides
information items estimated to be
valuable for a software engineering task
in a given context.”
22
Development of complex software systems by reusing
third-party open source components
Recommendation systems in Software Engineering
23
Development of complex software systems by reusing
third-party open source components
Recommendation systems in Software Engineering
24
Context
Related activities
- Searching for candidate components
- Evaluating a set of retrieved candidate components to find the most suitable one
- Understand how to use the selected components
- Monitoring the selected components
Development of new software systems
by reusing existing open source components
25
Context
Source code
Q&A systems
Bug Reports
API
Documentation
Tutorials
Configuration
Management Systems
26
Context
Related activities
- Searching for candidate components
- Evaluating a set of retrieved candidate components to find the most suitable one
- Understanding how to use the selected components
- Monitoring the selected components
Development of new software systems
by reusing existing open source components
27
Selecting and Using OSS components
Challenging tasks
- assessing quality, maturity, activity
of development and user support
is not a straightforward process
Different and heterogeneous
source of information
- e.g., code repositories,
communication channels, bug
tracking systems
Source code
Q&A systems
Bug Reports
API
Documentation
Tutorials
Configuration
Management Systems
28
Context
Related activities
- Searching for candidate components
- Evaluating a set of retrieved candidate components to find the most suitable one
- Understanding how to use the selected components
- Monitoring the selected components
Development of new software systems
by reusing existing open source components
29
Intelligent IDEs
query
recommendation
feed mine
Knowledge Base
training
prediction
Mining and Data
Extraction
Advanced IDEs
Incorporating various recommendation and Machine Learning techniques
Aiming to efficiently and effectively mine the existing open-source software
repositories
30
Examples of recommendations
Use of machine learning algorithms to produce recommendations during
development:
– Depending on the set of selected third-party libraries, the system is able to recommend
additional libraries that should be included in the project being developed
– Given a selected library, the system is able to suggest alternative ones that share some
similarities with the selected one
– Depending on the set of selected libraries, the system shows API documentation and Q&A
posts that can help developers to understand how to use the selected libraries
– During the development, developers get recommendations about API function calls and usage
patterns that might be used
– …
31
Eclipse SCAVA project
eclipse.org/scava
32
The CROSSMINER Recommendation Systems
CrossSim – Recommending similar projects
CrossRec – Recommending third-party libraries
FOCUS – Recommending API function calls and usage patterns
MNBN – Recommending GitHub topics
PostFinder - Recommending StackOverlfow posts
CrossRec
Recommending
third-party libraries
Phuong T. Nguyen, Juri Di Rocco, Davide Di Ruscio, Massimiliano Di Penta: CrossRec: Supporting software developers by
recommending third-party libraries. J. Syst. Softw. 161 (2020)
34
35
University of L'Aquila 35
WCRE 2013 - http://ieeexplore.ieee.org/document/6671293/
CROSSMINER Lisbon Meeting, 27-28 February 2018
36
CROSSMINER Lisbon Meeting, 27-28 February 2018
LibRec: Automated Library Recommendation
– can be considered as the most advanced technique for library recommendation
– finds relevant libraries, based on the current set of libraries that a project
already includes
– is able to recommend project libraries with high recall rates
37
37
Collaborative-Filtering Recommendation
R1 R2 R3
C1 5 5 2
C2 3 3 4
C3 5 5 ?
◼ User-item matrix: Ratings given to
Pizza restaurants by customers
◼ Unknown ratings can be deduced from
the most similar customers
37
CROSSMINER Lisbon Meeting, 27-28 February 2018
38
CrossRec
CROSSMINER Lisbon Meeting, 27-28 February 2018
39
CROSSMINER Lisbon Meeting, 27-28 February 2018
CrossRec: Projects-Libraries Representation
40
CrossRec: Projects-Libraries Representation
CROSSMINER Lisbon Meeting, 27-28 February 2018
◼ Representing the project-library relationships using a
user-item ratings matrix
◼ Predict the inclusion of additional libraries
41
Predict the inclusion of additional libraries
CROSSMINER Lisbon Meeting, 27-28 February 2018
◼ Missing “ratings” can be predicted using
collaborative-filtering techniques
◼ The row-wise and column-wise relationships are
exploited to compute missing ratings
42
CROSSMINER Lisbon Meeting, 27-28 February 2018
Evaluation
1.200 GitHub Java projects
43
CrossRec: The evaluation process
CROSSMINER Lisbon Meeting, 27-28 February 2018
44
CROSSMINER Lisbon Meeting, 27-28 February 2018
Ten-fold cross validation
The dataset was divided into ten equal parts, so-called folds
The validation has been conducted in ten rounds
For each round, nine folds are used as training data, and the remaining
fold is used as testing data
45
CROSSMINER Lisbon Meeting, 27-28 February 2018
Running Example
46
CROSSMINER Lisbon Meeting, 27-28 February 2018
Running Example
47
◼ Recall Rate: the rate at which a recommender
system can return at least a match among top-N
recommended items for every project
◼ Accuracy: Precision and Recall
Evaluation Metrics
CROSSMINER Lisbon Meeting, 27-28 February 2018
48
◼ Sales Diversity: the ability of the system to suggest
to projects as much libraries as possible
◼ Novelty: It measures if a system is able to expose
new and useful libraries to projects
Evaluation Metrics
CROSSMINER Lisbon Meeting, 27-28 February 2018
49
Recall Rate@N
CROSSMINER Lisbon Meeting, 27-28 February 2018
50
University of L'Aquila
Accuracy
CROSSMINER Lisbon Meeting, 27-28 February 2018
FOCUS
Recommending
API function calls and
usage patterns
Phuong T. Nguyen, Juri Di Rocco, Davide Di Ruscio, Lina Ochoa, Thomas Degueule, Massimiliano Di Penta: FOCUS: a recommender
system for mining API function calls and usage patterns. ICSE 2019: 1050-1060
52
Problem
“Which API methods should this piece of client code
invoke, considering that it has already invoked these
other API methods?”
53
FOCUS: Recommending APIs and code snippets
53
54
Context-aware recommendation
University of L'Aquila CROSSMINER Toulouse Meeting, 10-12 June 2018 54
Predict the inclusion of additional invocations
55
University of L'Aquila CROSSMINER Toulouse Meeting, 10-12 June 2018 55
Representation of Projects-MDs-MIs
3D user-item-context
ratings matrix
Mappings:
– contexts ←→ projects
– users ←→ declarations
– items ←→ invocations
56
Recommendation engine: API function calls
Generation of a ranked list of API function calls
• Additional invocations for the active declaration are predicted by
computing the missing ratings
• Ranked list of invocations with scores in descending order
57
Recommendation engine: API usage patterns
From the ranked list, top-N method invocations are used as query to
search for relevant declarations
Source code snippets containing the identified relevant declarations
are retrieved from the available source code base
58
MNBN
Recommending
GitHub topics
Claudio Di Sipio, Riccardo Rubei, Davide Di Ruscio, Phuong T. Nguyen: A Multinomial Naïve Bayesian (MNB) Network to
Automatically Recommend Topics for GitHub Repositories. EASE 2020: 71-80
60
GitHub topics
61
Proposed approach
Naïve Bayesian network is a probabilistic model based on the Bayesian theorem that expresses the probability of a certain
event given a set of preconditions
62
Example of repositories, their topics and the recommended topics
63
Evaluation
64
Development of the CROSSMINER
recommendation systems: main activities
65
Juri Di Rocco, Davide Di Ruscio, Claudio Di Sipio, Phuong T. Nguyen, Riccardo Rubei. Development of recommendation systems for software engineering:
the CROSSMINER experience, Accepted for publication at the Empirical Software Engineering Journal (2021) preprint https://arxiv.org/abs/2103.06987
66
Requirement elicitation phase: main challenge
Clear understanding of the needed recommendation systems:
• Understanding the functionalities that are expected from the final users of the envisioned
recommendation
• You might risk spending time on developing systems that are able to provide
recommendations, which instead might not be relevant and inline with the actual user
needs.
67
Requirement elicitation phase: main challenge
Applied solution
– We implemented demo projects that reflected real-world scenarios
– Explanatory context inputs and corresponding recommendation items that the
envisioned recommendation systems should have been able to produce.
68
Development phase: main challenge
Clear awareness of existing recommendation techniques
– Over the last decades, several recommendation systems have been developed
by both academia and industry
– It is crucial to have a clear knowledge of the possible techniques and patterns
that might be employed to develop new ones
– Since the solution space is extensive, comparing and evaluating candidate
approaches can be a very daunting task
69
Development phase: main challenge
Applied solution
– Significant effort has been devoted to analyze existing approaches that might
have been used as starting points.
Data Preprocessing Capturing Context
Producing
Recommendations
Presenting
Recommendations
70
71
Evaluation phase: main challenge
There is no golden rule for evaluating all possible recommendation
systems due to their intrinsic features as well as heterogeneity
– Which evaluation methodology is suitable?
– Which metric(s) can be used?
– Which dataset is eligible/available for evaluation?
– Which baseline(s) can be compared with?
72
University of L'Aquila CROSSMINER Lisbon Meeting, 27-28 February 2018
Evaluation phase: Ten-fold cross validation
The dataset was divided into ten equal parts, so-called folds
The validation has been conducted in ten rounds
For each round, nine folds are used as training data, and the remaining
fold is used as testing data
73
Evaluation phase: some CROSSMINER facts
74
Lessons learned
75
Lessons learned
76
Lessons learned
User scepticism: target users might be sceptical about the relevance of the
potential items that can be recommended.
Quality of data: importance of having the availability of big data and high-
quality data for training and evaluation activities
– The definition of data quality cannot be given in general, and it very much depends on
the particular application of interest
Baseline availability: Not always it is possible to reuse tools and data of the
identified baselines
– In our case, k-fold cross evaluation came at rescue
– Only for CrossSim we reimplemented the related tools
77
Lessons learned
In the case of the FOCUS evaluation, one of the considered datasets was
initially consisting of 5,147 Java projects retrieved from the Software Heritage
archive
To comply with the requirements of the baseline, we first restricted the
dataset to the list of projects that use at least one of the considered third-
party libraries.
To comply with the requirements of FOCUS, we restricted the dataset to
those projects containing at least one pom.xml file
Because of such constraints, we ended up with a dataset consisting of 610
Java projects
– we had to create a dataset ten times bigger than the used one for the evaluation
78
79
What’s next
Adversarial Machine Learning
– Manipulating training data to perturb recommendations
– Understanding attacks to recommender systems
– Finding decent countermeasures
80
What’s next
Dealing with time-series data in Software Engineering with deep
learning
– Recommending third-party libraries update for Android apps
– Predicting code insertion for LSP based notations, e.g., Visual Studio Code,
Theia
– Predicting model fragment insertion for GLSP based notations, e.g., EMF cloud,
Sprotty for visual language
81
What’s next
82
Claudio Di Sipio, Davide Di Ruscio, Phuong T. Nguyen: Democratizing the development of recommender systems by means of low-code
platforms. MODELS Companion 2020: 68:1-68:9
83
84
85
Some additional links
- http://www.ossmeter.org
- http://www.crossminer.org
- http://www.eclipse.org/scava
86
Main references (1/3)
● Juri Di Rocco, Davide Di Ruscio, Claudio Di Sipio, Phuong T. Nguyen, Riccardo Rubei,
“Development of recommendation systems for software engineering: the CROSSMINER experience,”
Empirical Software Engineering (EMSE), 2021, pre-print https://arxiv.org/abs/2103.06987
● Phuong T. Nguyen, Juri Di Rocco, Claudio Di Sipio, Davide Di Ruscio, Massimiliano Di Penta
“Recommending API Function Calls and Code Snippets to Support Software Development,” IEEE
Transactions on Software Engineering (TSE), 2021, ISSN: 1939-3520, DOI:
10.1109/TSE.2021.3059907
● Phuong T. Nguyen, Juri Di Rocco, Davide Di Ruscio, Massimiliano Di Penta, “CrossRec: Supporting
Software Developers by Recommending Third-party Libraries,” Journal of Systems and Software
(JSS), 2020, ISSN: 0164-1212, DOI: 10.1016/j.jss.2019.110460
● Phuong T. Nguyen, Juri Di Rocco, Riccardo Rubei, Davide Di Ruscio, “An Automated Approach to
Assess the Similarity of GitHub Repositories,” Software Quality Journal (SQJ), 2020, ISSN: 0963-9314,
DOI: 10.1007/s11219-019-09483-0
87
Main references (2/3)
● Andrea Capiluppi, Davide Di Ruscio, Juri Di Rocco, Phuong T. Nguyen, Nemitari Ajienka, “Detecting
Java Software Similarities by using Different Clustering Techniques,” Information and Software
Technology (IST), 2020, ISSN: 0950-5849, DOI: 10.1016/j.infsof.2020.106279
● Riccardo Rubei, Claudio Di Sipio, Phuong T. Nguyen, Juri Di Rocco, Davide Di Ruscio, “PostFinder:
Mining Stack Overflow posts to support software developers,” Information and Software and
Technology (IST), 2020, ISSN: 0950-5849, DOI: 10.1016/j.infsof.2020.106367
● Phuong T. Nguyen, Juri Di Rocco, Davide Di Ruscio, Lina Ochoa, Thomas Degueule, Massimiliano
Di Penta, “FOCUS: A Recommender System for Mining API Function Calls and Usage Patterns,” In
Proceedings of the 41st International Conference on Software Engineering, ICSE 2019, DOI:
10.1109/ICSE.2019.00109
88
Main references (3/3)
● Phuong T. Nguyen, Juri Di Rocco, Davide Di Ruscio, Alfonso Pierantonio, Ludovico Iovino,
“Automated Classification of Metamodel Repositories: A Machine Learning Approach,” MODELS 2019,
DOI: 10.1109/MODELS.2019.00011
● Phuong T. Nguyen, Juri Di Rocco, Davide Di Ruscio: “Enabling heterogeneous recommendations in
OSS development: what’s done and what’s next in CROSSMINER” In Proceedings of the 23rd Int.
Conf. on Evaluation and Assessment on Software Engineering, EASE 2019, DOI:
10.1145/3319008.3319353
● Phuong T. Nguyen, Juri Di Rocco, Riccardo Rubei, Davide Di Ruscio, “CrossSim: exploiting mutual
relationships to detect similar OSS projects,” In Proceedings of the 44th Euromicro Conference on
Software Engineering and Advanced Applications (SEAA 2018), ISBN: 978-1-5386-7383-6, DOI:
10.1109/SEAA.2018.00069
89
Thanks Juri Di Rocco, Claudio Di Sipio Phuong T. Nguyen, Alfonso Pierantonio, Riccardo Rubei, for some of the used slides

Weitere ähnliche Inhalte

Ähnlich wie Developing recommendation systems to support open source software developers challenges and lessons learned

Customer to Customer recommendation system
Customer to Customer recommendation systemCustomer to Customer recommendation system
Customer to Customer recommendation systemsksaif95
 
CROSSMINER - Developer-Centric Knowledge Mining from Large Open-Source Softwa...
CROSSMINER - Developer-Centric Knowledge Mining from Large Open-Source Softwa...CROSSMINER - Developer-Centric Knowledge Mining from Large Open-Source Softwa...
CROSSMINER - Developer-Centric Knowledge Mining from Large Open-Source Softwa...OW2
 
Software management plans in research software
Software management plans in research softwareSoftware management plans in research software
Software management plans in research softwareShoaib Sufi
 
Recommender Systems @ Scale, Big Data Europe Conference 2019
Recommender Systems @ Scale, Big Data Europe Conference 2019Recommender Systems @ Scale, Big Data Europe Conference 2019
Recommender Systems @ Scale, Big Data Europe Conference 2019Sonya Liberman
 
Maintaining and Releasing Open Source Software
Maintaining and Releasing Open Source SoftwareMaintaining and Releasing Open Source Software
Maintaining and Releasing Open Source SoftwareJoel Nothman
 
Intelligent Software Updates: Leveraging the Software Ecosystem to Support wh...
Intelligent Software Updates: Leveraging the Software Ecosystem to Support wh...Intelligent Software Updates: Leveraging the Software Ecosystem to Support wh...
Intelligent Software Updates: Leveraging the Software Ecosystem to Support wh...Au Gai
 
NISO-STM RA21 Project Update
NISO-STM RA21 Project UpdateNISO-STM RA21 Project Update
NISO-STM RA21 Project UpdateTACNISO
 
Chapter 7 Development StrategiesInformation Technology Project Management .pptx
Chapter 7 Development StrategiesInformation Technology Project Management  .pptxChapter 7 Development StrategiesInformation Technology Project Management  .pptx
Chapter 7 Development StrategiesInformation Technology Project Management .pptxAxmedMaxamuudYoonis
 
Conference Identity: persistent identifiers for conferences
Conference Identity: persistent identifiers for conferencesConference Identity: persistent identifiers for conferences
Conference Identity: persistent identifiers for conferencesAliaksandr Birukou
 
Infusing Social Data Analytics into Future Internet applications for Manufact...
Infusing Social Data Analytics into Future Internet applications for Manufact...Infusing Social Data Analytics into Future Internet applications for Manufact...
Infusing Social Data Analytics into Future Internet applications for Manufact...Michael Petychakis
 
SE_Module1new.ppt
SE_Module1new.pptSE_Module1new.ppt
SE_Module1new.pptADARSHN40
 
System Development Overview Assignment 3
System Development Overview Assignment 3System Development Overview Assignment 3
System Development Overview Assignment 3Ashley Fisher
 
GFW Partner Meeting 2017 - Parallel Discussions 2: Private Sector
GFW Partner Meeting 2017 - Parallel Discussions 2: Private SectorGFW Partner Meeting 2017 - Parallel Discussions 2: Private Sector
GFW Partner Meeting 2017 - Parallel Discussions 2: Private SectorWorld Resources Institute (WRI)
 
chapter07-120827115403-phpapp01.pdf
chapter07-120827115403-phpapp01.pdfchapter07-120827115403-phpapp01.pdf
chapter07-120827115403-phpapp01.pdfAxmedMaxamuud6
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systemsvivatechijri
 
Improving the User Experience of UiPath Apps
Improving the User Experience of UiPath AppsImproving the User Experience of UiPath Apps
Improving the User Experience of UiPath AppsDianaGray10
 
A Content Boosted Hybrid Recommendation System
A Content Boosted Hybrid Recommendation SystemA Content Boosted Hybrid Recommendation System
A Content Boosted Hybrid Recommendation SystemSeval Çapraz
 

Ähnlich wie Developing recommendation systems to support open source software developers challenges and lessons learned (20)

Customer to Customer recommendation system
Customer to Customer recommendation systemCustomer to Customer recommendation system
Customer to Customer recommendation system
 
1802_Crossminer_OCF2018
1802_Crossminer_OCF20181802_Crossminer_OCF2018
1802_Crossminer_OCF2018
 
CROSSMINER - Developer-Centric Knowledge Mining from Large Open-Source Softwa...
CROSSMINER - Developer-Centric Knowledge Mining from Large Open-Source Softwa...CROSSMINER - Developer-Centric Knowledge Mining from Large Open-Source Softwa...
CROSSMINER - Developer-Centric Knowledge Mining from Large Open-Source Softwa...
 
Software management plans in research software
Software management plans in research softwareSoftware management plans in research software
Software management plans in research software
 
Recommender Systems @ Scale, Big Data Europe Conference 2019
Recommender Systems @ Scale, Big Data Europe Conference 2019Recommender Systems @ Scale, Big Data Europe Conference 2019
Recommender Systems @ Scale, Big Data Europe Conference 2019
 
Maintaining and Releasing Open Source Software
Maintaining and Releasing Open Source SoftwareMaintaining and Releasing Open Source Software
Maintaining and Releasing Open Source Software
 
Intelligent Software Updates: Leveraging the Software Ecosystem to Support wh...
Intelligent Software Updates: Leveraging the Software Ecosystem to Support wh...Intelligent Software Updates: Leveraging the Software Ecosystem to Support wh...
Intelligent Software Updates: Leveraging the Software Ecosystem to Support wh...
 
NISO-STM RA21 Project Update
NISO-STM RA21 Project UpdateNISO-STM RA21 Project Update
NISO-STM RA21 Project Update
 
Chapter 7 Development StrategiesInformation Technology Project Management .pptx
Chapter 7 Development StrategiesInformation Technology Project Management  .pptxChapter 7 Development StrategiesInformation Technology Project Management  .pptx
Chapter 7 Development StrategiesInformation Technology Project Management .pptx
 
Conference Identity: persistent identifiers for conferences
Conference Identity: persistent identifiers for conferencesConference Identity: persistent identifiers for conferences
Conference Identity: persistent identifiers for conferences
 
Infusing Social Data Analytics into Future Internet applications for Manufact...
Infusing Social Data Analytics into Future Internet applications for Manufact...Infusing Social Data Analytics into Future Internet applications for Manufact...
Infusing Social Data Analytics into Future Internet applications for Manufact...
 
SE_Module1new.ppt
SE_Module1new.pptSE_Module1new.ppt
SE_Module1new.ppt
 
System Development Overview Assignment 3
System Development Overview Assignment 3System Development Overview Assignment 3
System Development Overview Assignment 3
 
GFW Partner Meeting 2017 - Parallel Discussions 2: Private Sector
GFW Partner Meeting 2017 - Parallel Discussions 2: Private SectorGFW Partner Meeting 2017 - Parallel Discussions 2: Private Sector
GFW Partner Meeting 2017 - Parallel Discussions 2: Private Sector
 
chapter07-120827115403-phpapp01.pdf
chapter07-120827115403-phpapp01.pdfchapter07-120827115403-phpapp01.pdf
chapter07-120827115403-phpapp01.pdf
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
 
Improving the User Experience of UiPath Apps
Improving the User Experience of UiPath AppsImproving the User Experience of UiPath Apps
Improving the User Experience of UiPath Apps
 
Software Analytics
Software AnalyticsSoftware Analytics
Software Analytics
 
A Content Boosted Hybrid Recommendation System
A Content Boosted Hybrid Recommendation SystemA Content Boosted Hybrid Recommendation System
A Content Boosted Hybrid Recommendation System
 
243
243243
243
 

Mehr von Davide Ruscio

Detecting java software similarities by using different clustering
Detecting java software similarities by using different clusteringDetecting java software similarities by using different clustering
Detecting java software similarities by using different clusteringDavide Ruscio
 
FOCUS: A Recommender System for Mining API Function Calls and Usage Patterns
FOCUS:  A Recommender System for Mining API Function Calls and  Usage PatternsFOCUS:  A Recommender System for Mining API Function Calls and  Usage Patterns
FOCUS: A Recommender System for Mining API Function Calls and Usage PatternsDavide Ruscio
 
CrossSim: exploiting mutual relationships to detect similar OSS projects
CrossSim: exploiting mutual relationships to detect similar OSS projectsCrossSim: exploiting mutual relationships to detect similar OSS projects
CrossSim: exploiting mutual relationships to detect similar OSS projectsDavide Ruscio
 
Use of MDE to Analyse Open Source Software
Use of MDE to Analyse Open Source SoftwareUse of MDE to Analyse Open Source Software
Use of MDE to Analyse Open Source SoftwareDavide Ruscio
 
Consistency Recovery in Interactive Modeling
Consistency Recovery in Interactive ModelingConsistency Recovery in Interactive Modeling
Consistency Recovery in Interactive ModelingDavide Ruscio
 
Edelta: an approach for defining and applying reusable metamodel refactorings
Edelta: an approach for defining and applying reusable metamodel refactoringsEdelta: an approach for defining and applying reusable metamodel refactorings
Edelta: an approach for defining and applying reusable metamodel refactoringsDavide Ruscio
 
Semantic based model matching with emf compare
Semantic based model matching with emf compareSemantic based model matching with emf compare
Semantic based model matching with emf compareDavide Ruscio
 
Collaborative model driven software engineering: a Systematic Mapping Study
Collaborative model driven software engineering: a Systematic Mapping StudyCollaborative model driven software engineering: a Systematic Mapping Study
Collaborative model driven software engineering: a Systematic Mapping StudyDavide Ruscio
 
Model repositories: will they become reality?
Model repositories: will they become reality?Model repositories: will they become reality?
Model repositories: will they become reality?Davide Ruscio
 
Mining Correlations of ATL Transformation and Metamodel Metrics
Mining Correlations of ATL Transformation and Metamodel MetricsMining Correlations of ATL Transformation and Metamodel Metrics
Mining Correlations of ATL Transformation and Metamodel Metrics Davide Ruscio
 
MDEForge: an extensible Web-based modeling platform
MDEForge: an extensible Web-based modeling platformMDEForge: an extensible Web-based modeling platform
MDEForge: an extensible Web-based modeling platformDavide Ruscio
 

Mehr von Davide Ruscio (11)

Detecting java software similarities by using different clustering
Detecting java software similarities by using different clusteringDetecting java software similarities by using different clustering
Detecting java software similarities by using different clustering
 
FOCUS: A Recommender System for Mining API Function Calls and Usage Patterns
FOCUS:  A Recommender System for Mining API Function Calls and  Usage PatternsFOCUS:  A Recommender System for Mining API Function Calls and  Usage Patterns
FOCUS: A Recommender System for Mining API Function Calls and Usage Patterns
 
CrossSim: exploiting mutual relationships to detect similar OSS projects
CrossSim: exploiting mutual relationships to detect similar OSS projectsCrossSim: exploiting mutual relationships to detect similar OSS projects
CrossSim: exploiting mutual relationships to detect similar OSS projects
 
Use of MDE to Analyse Open Source Software
Use of MDE to Analyse Open Source SoftwareUse of MDE to Analyse Open Source Software
Use of MDE to Analyse Open Source Software
 
Consistency Recovery in Interactive Modeling
Consistency Recovery in Interactive ModelingConsistency Recovery in Interactive Modeling
Consistency Recovery in Interactive Modeling
 
Edelta: an approach for defining and applying reusable metamodel refactorings
Edelta: an approach for defining and applying reusable metamodel refactoringsEdelta: an approach for defining and applying reusable metamodel refactorings
Edelta: an approach for defining and applying reusable metamodel refactorings
 
Semantic based model matching with emf compare
Semantic based model matching with emf compareSemantic based model matching with emf compare
Semantic based model matching with emf compare
 
Collaborative model driven software engineering: a Systematic Mapping Study
Collaborative model driven software engineering: a Systematic Mapping StudyCollaborative model driven software engineering: a Systematic Mapping Study
Collaborative model driven software engineering: a Systematic Mapping Study
 
Model repositories: will they become reality?
Model repositories: will they become reality?Model repositories: will they become reality?
Model repositories: will they become reality?
 
Mining Correlations of ATL Transformation and Metamodel Metrics
Mining Correlations of ATL Transformation and Metamodel MetricsMining Correlations of ATL Transformation and Metamodel Metrics
Mining Correlations of ATL Transformation and Metamodel Metrics
 
MDEForge: an extensible Web-based modeling platform
MDEForge: an extensible Web-based modeling platformMDEForge: an extensible Web-based modeling platform
MDEForge: an extensible Web-based modeling platform
 

Kürzlich hochgeladen

Deep Learning for Images with PyTorch - Datacamp
Deep Learning for Images with PyTorch - DatacampDeep Learning for Images with PyTorch - Datacamp
Deep Learning for Images with PyTorch - DatacampVICTOR MAESTRE RAMIREZ
 
online pdf editor software solutions.pdf
online pdf editor software solutions.pdfonline pdf editor software solutions.pdf
online pdf editor software solutions.pdfMeon Technology
 
ARM Talk @ Rejekts - Will ARM be the new Mainstream in our Data Centers_.pdf
ARM Talk @ Rejekts - Will ARM be the new Mainstream in our Data Centers_.pdfARM Talk @ Rejekts - Will ARM be the new Mainstream in our Data Centers_.pdf
ARM Talk @ Rejekts - Will ARM be the new Mainstream in our Data Centers_.pdfTobias Schneck
 
Top Software Development Trends in 2024
Top Software Development Trends in  2024Top Software Development Trends in  2024
Top Software Development Trends in 2024Mind IT Systems
 
Why Choose Brain Inventory For Ecommerce Development.pdf
Why Choose Brain Inventory For Ecommerce Development.pdfWhy Choose Brain Inventory For Ecommerce Development.pdf
Why Choose Brain Inventory For Ecommerce Development.pdfBrain Inventory
 
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/MLBig Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/MLAlluxio, Inc.
 
Your Vision, Our Expertise: TECUNIQUE's Tailored Software Teams
Your Vision, Our Expertise: TECUNIQUE's Tailored Software TeamsYour Vision, Our Expertise: TECUNIQUE's Tailored Software Teams
Your Vision, Our Expertise: TECUNIQUE's Tailored Software TeamsJaydeep Chhasatia
 
Streamlining Your Application Builds with Cloud Native Buildpacks
Streamlining Your Application Builds  with Cloud Native BuildpacksStreamlining Your Application Builds  with Cloud Native Buildpacks
Streamlining Your Application Builds with Cloud Native BuildpacksVish Abrams
 
Optimizing Business Potential: A Guide to Outsourcing Engineering Services in...
Optimizing Business Potential: A Guide to Outsourcing Engineering Services in...Optimizing Business Potential: A Guide to Outsourcing Engineering Services in...
Optimizing Business Potential: A Guide to Outsourcing Engineering Services in...Jaydeep Chhasatia
 
OpenChain Webinar: Universal CVSS Calculator
OpenChain Webinar: Universal CVSS CalculatorOpenChain Webinar: Universal CVSS Calculator
OpenChain Webinar: Universal CVSS CalculatorShane Coughlan
 
AI Embracing Every Shade of Human Beauty
AI Embracing Every Shade of Human BeautyAI Embracing Every Shade of Human Beauty
AI Embracing Every Shade of Human BeautyRaymond Okyere-Forson
 
How Does the Epitome of Spyware Differ from Other Malicious Software?
How Does the Epitome of Spyware Differ from Other Malicious Software?How Does the Epitome of Spyware Differ from Other Malicious Software?
How Does the Epitome of Spyware Differ from Other Malicious Software?AmeliaSmith90
 
eAuditor Audits & Inspections - conduct field inspections
eAuditor Audits & Inspections - conduct field inspectionseAuditor Audits & Inspections - conduct field inspections
eAuditor Audits & Inspections - conduct field inspectionsNirav Modi
 
Leveraging DxSherpa's Generative AI Services to Unlock Human-Machine Harmony
Leveraging DxSherpa's Generative AI Services to Unlock Human-Machine HarmonyLeveraging DxSherpa's Generative AI Services to Unlock Human-Machine Harmony
Leveraging DxSherpa's Generative AI Services to Unlock Human-Machine Harmonyelliciumsolutionspun
 
Cybersecurity Challenges with Generative AI - for Good and Bad
Cybersecurity Challenges with Generative AI - for Good and BadCybersecurity Challenges with Generative AI - for Good and Bad
Cybersecurity Challenges with Generative AI - for Good and BadIvo Andreev
 
Webinar_050417_LeClair12345666777889.ppt
Webinar_050417_LeClair12345666777889.pptWebinar_050417_LeClair12345666777889.ppt
Webinar_050417_LeClair12345666777889.pptkinjal48
 
JS-Experts - Cybersecurity for Generative AI
JS-Experts - Cybersecurity for Generative AIJS-Experts - Cybersecurity for Generative AI
JS-Experts - Cybersecurity for Generative AIIvo Andreev
 
Fields in Java and Kotlin and what to expect.pptx
Fields in Java and Kotlin and what to expect.pptxFields in Java and Kotlin and what to expect.pptx
Fields in Java and Kotlin and what to expect.pptxJoão Esperancinha
 
Introduction-to-Software-Development-Outsourcing.pptx
Introduction-to-Software-Development-Outsourcing.pptxIntroduction-to-Software-Development-Outsourcing.pptx
Introduction-to-Software-Development-Outsourcing.pptxIntelliSource Technologies
 
Sales Territory Management: A Definitive Guide to Expand Sales Coverage
Sales Territory Management: A Definitive Guide to Expand Sales CoverageSales Territory Management: A Definitive Guide to Expand Sales Coverage
Sales Territory Management: A Definitive Guide to Expand Sales CoverageDista
 

Kürzlich hochgeladen (20)

Deep Learning for Images with PyTorch - Datacamp
Deep Learning for Images with PyTorch - DatacampDeep Learning for Images with PyTorch - Datacamp
Deep Learning for Images with PyTorch - Datacamp
 
online pdf editor software solutions.pdf
online pdf editor software solutions.pdfonline pdf editor software solutions.pdf
online pdf editor software solutions.pdf
 
ARM Talk @ Rejekts - Will ARM be the new Mainstream in our Data Centers_.pdf
ARM Talk @ Rejekts - Will ARM be the new Mainstream in our Data Centers_.pdfARM Talk @ Rejekts - Will ARM be the new Mainstream in our Data Centers_.pdf
ARM Talk @ Rejekts - Will ARM be the new Mainstream in our Data Centers_.pdf
 
Top Software Development Trends in 2024
Top Software Development Trends in  2024Top Software Development Trends in  2024
Top Software Development Trends in 2024
 
Why Choose Brain Inventory For Ecommerce Development.pdf
Why Choose Brain Inventory For Ecommerce Development.pdfWhy Choose Brain Inventory For Ecommerce Development.pdf
Why Choose Brain Inventory For Ecommerce Development.pdf
 
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/MLBig Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
 
Your Vision, Our Expertise: TECUNIQUE's Tailored Software Teams
Your Vision, Our Expertise: TECUNIQUE's Tailored Software TeamsYour Vision, Our Expertise: TECUNIQUE's Tailored Software Teams
Your Vision, Our Expertise: TECUNIQUE's Tailored Software Teams
 
Streamlining Your Application Builds with Cloud Native Buildpacks
Streamlining Your Application Builds  with Cloud Native BuildpacksStreamlining Your Application Builds  with Cloud Native Buildpacks
Streamlining Your Application Builds with Cloud Native Buildpacks
 
Optimizing Business Potential: A Guide to Outsourcing Engineering Services in...
Optimizing Business Potential: A Guide to Outsourcing Engineering Services in...Optimizing Business Potential: A Guide to Outsourcing Engineering Services in...
Optimizing Business Potential: A Guide to Outsourcing Engineering Services in...
 
OpenChain Webinar: Universal CVSS Calculator
OpenChain Webinar: Universal CVSS CalculatorOpenChain Webinar: Universal CVSS Calculator
OpenChain Webinar: Universal CVSS Calculator
 
AI Embracing Every Shade of Human Beauty
AI Embracing Every Shade of Human BeautyAI Embracing Every Shade of Human Beauty
AI Embracing Every Shade of Human Beauty
 
How Does the Epitome of Spyware Differ from Other Malicious Software?
How Does the Epitome of Spyware Differ from Other Malicious Software?How Does the Epitome of Spyware Differ from Other Malicious Software?
How Does the Epitome of Spyware Differ from Other Malicious Software?
 
eAuditor Audits & Inspections - conduct field inspections
eAuditor Audits & Inspections - conduct field inspectionseAuditor Audits & Inspections - conduct field inspections
eAuditor Audits & Inspections - conduct field inspections
 
Leveraging DxSherpa's Generative AI Services to Unlock Human-Machine Harmony
Leveraging DxSherpa's Generative AI Services to Unlock Human-Machine HarmonyLeveraging DxSherpa's Generative AI Services to Unlock Human-Machine Harmony
Leveraging DxSherpa's Generative AI Services to Unlock Human-Machine Harmony
 
Cybersecurity Challenges with Generative AI - for Good and Bad
Cybersecurity Challenges with Generative AI - for Good and BadCybersecurity Challenges with Generative AI - for Good and Bad
Cybersecurity Challenges with Generative AI - for Good and Bad
 
Webinar_050417_LeClair12345666777889.ppt
Webinar_050417_LeClair12345666777889.pptWebinar_050417_LeClair12345666777889.ppt
Webinar_050417_LeClair12345666777889.ppt
 
JS-Experts - Cybersecurity for Generative AI
JS-Experts - Cybersecurity for Generative AIJS-Experts - Cybersecurity for Generative AI
JS-Experts - Cybersecurity for Generative AI
 
Fields in Java and Kotlin and what to expect.pptx
Fields in Java and Kotlin and what to expect.pptxFields in Java and Kotlin and what to expect.pptx
Fields in Java and Kotlin and what to expect.pptx
 
Introduction-to-Software-Development-Outsourcing.pptx
Introduction-to-Software-Development-Outsourcing.pptxIntroduction-to-Software-Development-Outsourcing.pptx
Introduction-to-Software-Development-Outsourcing.pptx
 
Sales Territory Management: A Definitive Guide to Expand Sales Coverage
Sales Territory Management: A Definitive Guide to Expand Sales CoverageSales Territory Management: A Definitive Guide to Expand Sales Coverage
Sales Territory Management: A Definitive Guide to Expand Sales Coverage
 

Developing recommendation systems to support open source software developers challenges and lessons learned

  • 1. http://people.disim.univaq.it/diruscio/ davide.diruscio@univaq.it @ddiruscio Dipartimento di Ingegneria e Scienze Università degli Studi dell’Aquila dell’Informazione e Matematica Developing recommendation systems to support open-source software developers: challenges and lessons learned Davide Di Ruscio
  • 3. 3 Development of complex software systems by reusing third-party open source components Recommendation systems in Software Engineering
  • 4. 4
  • 6. 6 Problem domain Recommendation systems (RS) help to match users with items – Ease information overload – Sales assistance (guidance, advisory, persuasion,…) Different system designs / paradigms – Based on availability of exploitable data – Implicit and explicit user feedback – Domain characteristics RS are software agents that elicit the interests and preferences of individual consumers […] and make recommendations accordingly. They have the potential to support and improve the quality of the decision's consumers make while searching for and selecting products online. [Xiao & Benbasat, MISQ, 2007] http://clgiles.ist.psu.edu/IST441/materials/powerpoint/RC/rec.pptx
  • 7. 7 Recommender systems RS seen as a function Given: – User model (e.g. ratings, preferences, demographics, situational context) – Items (with or without description of item characteristics) Find: – Relevance score. Used for ranking. Finally: – Recommend items that are assumed to be relevant http://clgiles.ist.psu.edu/IST441/materials/powerpoint/RC/rec.pptx
  • 8. 8 Recommender systems RS seen as a function Given: – User model (e.g. ratings, preferences, demographics, situational context) – Items (with or without description of item characteristics) Find: – Relevance score. Used for ranking. Finally: – Recommend items that are assumed to be relevant But: • Remember that relevance might be context-dependent • Characteristics of the list itself might be important (diversity) http://clgiles.ist.psu.edu/IST441/materials/powerpoint/RC/rec.pptx
  • 9. 9 Paradigms of recommender systems Recommender systems reduce information overload by estimating relevance http://clgiles.ist.psu.edu/IST441/materials/powerpoint/RC/rec.pptx
  • 10. 10 Paradigms of recommender systems Personalized recommendations http://clgiles.ist.psu.edu/IST441/materials/powerpoint/RC/rec.pptx
  • 11. 11 Paradigms of recommender systems Collaborative: "Tell me what's popular among my peers" http://clgiles.ist.psu.edu/IST441/materials/powerpoint/RC/rec.pptx
  • 12. 12 Paradigms of recommender systems Content-based: "Show me more of the same what I've liked" http://clgiles.ist.psu.edu/IST441/materials/powerpoint/RC/rec.pptx
  • 13. 13 Paradigms of recommender systems Knowledge-based: "Tell me what fits based on my needs" http://clgiles.ist.psu.edu/IST441/materials/powerpoint/RC/rec.pptx
  • 14. 14 Paradigms of recommender systems Hybrid: combinations of various inputs and/or composition of different mechanism http://clgiles.ist.psu.edu/IST441/materials/powerpoint/RC/rec.pptx
  • 15. 15 Collaborative-Filtering Technique ■ Providing users with suggestions that fit their taste/need
  • 16. 16 Collaborative-Filtering Technique ■ Providing users with suggestions that fit their taste/need ■ Being based on the preferences of similar users
  • 17. 17 Collaborative-Filtering Technique ■ Providing users with suggestions that fit their taste/need ■ Being based on the preferences of similar users
  • 18. 18 Collaborative-Filtering Technique ■ Providing users with suggestions that fit their taste/need ■ Being based on the preferences of similar users
  • 19. 19 Collaborative-Filtering Technique ■ Providing users with suggestions that fit their taste/need ■ Being based on the preferences of similar users
  • 20. 20 Collaborative-Filtering Technique 20 R1 R2 R3 c1 5 5 2 c2 3 3 4 c3 5 5 ? Internal Meeting, 31 October 2017 User-item matrix: Ratings given to Pizza restaurants by customers
  • 21. 21 Recommendation Systems in Software Engineering A recommendation system in software engineering is “. . . a software application that provides information items estimated to be valuable for a software engineering task in a given context.”
  • 22. 22 Development of complex software systems by reusing third-party open source components Recommendation systems in Software Engineering
  • 23. 23 Development of complex software systems by reusing third-party open source components Recommendation systems in Software Engineering
  • 24. 24 Context Related activities - Searching for candidate components - Evaluating a set of retrieved candidate components to find the most suitable one - Understand how to use the selected components - Monitoring the selected components Development of new software systems by reusing existing open source components
  • 25. 25 Context Source code Q&A systems Bug Reports API Documentation Tutorials Configuration Management Systems
  • 26. 26 Context Related activities - Searching for candidate components - Evaluating a set of retrieved candidate components to find the most suitable one - Understanding how to use the selected components - Monitoring the selected components Development of new software systems by reusing existing open source components
  • 27. 27 Selecting and Using OSS components Challenging tasks - assessing quality, maturity, activity of development and user support is not a straightforward process Different and heterogeneous source of information - e.g., code repositories, communication channels, bug tracking systems Source code Q&A systems Bug Reports API Documentation Tutorials Configuration Management Systems
  • 28. 28 Context Related activities - Searching for candidate components - Evaluating a set of retrieved candidate components to find the most suitable one - Understanding how to use the selected components - Monitoring the selected components Development of new software systems by reusing existing open source components
  • 29. 29 Intelligent IDEs query recommendation feed mine Knowledge Base training prediction Mining and Data Extraction Advanced IDEs Incorporating various recommendation and Machine Learning techniques Aiming to efficiently and effectively mine the existing open-source software repositories
  • 30. 30 Examples of recommendations Use of machine learning algorithms to produce recommendations during development: – Depending on the set of selected third-party libraries, the system is able to recommend additional libraries that should be included in the project being developed – Given a selected library, the system is able to suggest alternative ones that share some similarities with the selected one – Depending on the set of selected libraries, the system shows API documentation and Q&A posts that can help developers to understand how to use the selected libraries – During the development, developers get recommendations about API function calls and usage patterns that might be used – …
  • 32. 32 The CROSSMINER Recommendation Systems CrossSim – Recommending similar projects CrossRec – Recommending third-party libraries FOCUS – Recommending API function calls and usage patterns MNBN – Recommending GitHub topics PostFinder - Recommending StackOverlfow posts
  • 33. CrossRec Recommending third-party libraries Phuong T. Nguyen, Juri Di Rocco, Davide Di Ruscio, Massimiliano Di Penta: CrossRec: Supporting software developers by recommending third-party libraries. J. Syst. Softw. 161 (2020)
  • 34. 34
  • 35. 35 University of L'Aquila 35 WCRE 2013 - http://ieeexplore.ieee.org/document/6671293/ CROSSMINER Lisbon Meeting, 27-28 February 2018
  • 36. 36 CROSSMINER Lisbon Meeting, 27-28 February 2018 LibRec: Automated Library Recommendation – can be considered as the most advanced technique for library recommendation – finds relevant libraries, based on the current set of libraries that a project already includes – is able to recommend project libraries with high recall rates
  • 37. 37 37 Collaborative-Filtering Recommendation R1 R2 R3 C1 5 5 2 C2 3 3 4 C3 5 5 ? ◼ User-item matrix: Ratings given to Pizza restaurants by customers ◼ Unknown ratings can be deduced from the most similar customers 37 CROSSMINER Lisbon Meeting, 27-28 February 2018
  • 39. 39 CROSSMINER Lisbon Meeting, 27-28 February 2018 CrossRec: Projects-Libraries Representation
  • 40. 40 CrossRec: Projects-Libraries Representation CROSSMINER Lisbon Meeting, 27-28 February 2018 ◼ Representing the project-library relationships using a user-item ratings matrix ◼ Predict the inclusion of additional libraries
  • 41. 41 Predict the inclusion of additional libraries CROSSMINER Lisbon Meeting, 27-28 February 2018 ◼ Missing “ratings” can be predicted using collaborative-filtering techniques ◼ The row-wise and column-wise relationships are exploited to compute missing ratings
  • 42. 42 CROSSMINER Lisbon Meeting, 27-28 February 2018 Evaluation 1.200 GitHub Java projects
  • 43. 43 CrossRec: The evaluation process CROSSMINER Lisbon Meeting, 27-28 February 2018
  • 44. 44 CROSSMINER Lisbon Meeting, 27-28 February 2018 Ten-fold cross validation The dataset was divided into ten equal parts, so-called folds The validation has been conducted in ten rounds For each round, nine folds are used as training data, and the remaining fold is used as testing data
  • 45. 45 CROSSMINER Lisbon Meeting, 27-28 February 2018 Running Example
  • 46. 46 CROSSMINER Lisbon Meeting, 27-28 February 2018 Running Example
  • 47. 47 ◼ Recall Rate: the rate at which a recommender system can return at least a match among top-N recommended items for every project ◼ Accuracy: Precision and Recall Evaluation Metrics CROSSMINER Lisbon Meeting, 27-28 February 2018
  • 48. 48 ◼ Sales Diversity: the ability of the system to suggest to projects as much libraries as possible ◼ Novelty: It measures if a system is able to expose new and useful libraries to projects Evaluation Metrics CROSSMINER Lisbon Meeting, 27-28 February 2018
  • 49. 49 Recall Rate@N CROSSMINER Lisbon Meeting, 27-28 February 2018
  • 50. 50 University of L'Aquila Accuracy CROSSMINER Lisbon Meeting, 27-28 February 2018
  • 51. FOCUS Recommending API function calls and usage patterns Phuong T. Nguyen, Juri Di Rocco, Davide Di Ruscio, Lina Ochoa, Thomas Degueule, Massimiliano Di Penta: FOCUS: a recommender system for mining API function calls and usage patterns. ICSE 2019: 1050-1060
  • 52. 52 Problem “Which API methods should this piece of client code invoke, considering that it has already invoked these other API methods?”
  • 53. 53 FOCUS: Recommending APIs and code snippets 53
  • 54. 54 Context-aware recommendation University of L'Aquila CROSSMINER Toulouse Meeting, 10-12 June 2018 54 Predict the inclusion of additional invocations
  • 55. 55 University of L'Aquila CROSSMINER Toulouse Meeting, 10-12 June 2018 55 Representation of Projects-MDs-MIs 3D user-item-context ratings matrix Mappings: – contexts ←→ projects – users ←→ declarations – items ←→ invocations
  • 56. 56 Recommendation engine: API function calls Generation of a ranked list of API function calls • Additional invocations for the active declaration are predicted by computing the missing ratings • Ranked list of invocations with scores in descending order
  • 57. 57 Recommendation engine: API usage patterns From the ranked list, top-N method invocations are used as query to search for relevant declarations Source code snippets containing the identified relevant declarations are retrieved from the available source code base
  • 58. 58
  • 59. MNBN Recommending GitHub topics Claudio Di Sipio, Riccardo Rubei, Davide Di Ruscio, Phuong T. Nguyen: A Multinomial Naïve Bayesian (MNB) Network to Automatically Recommend Topics for GitHub Repositories. EASE 2020: 71-80
  • 61. 61 Proposed approach Naïve Bayesian network is a probabilistic model based on the Bayesian theorem that expresses the probability of a certain event given a set of preconditions
  • 62. 62 Example of repositories, their topics and the recommended topics
  • 64. 64 Development of the CROSSMINER recommendation systems: main activities
  • 65. 65 Juri Di Rocco, Davide Di Ruscio, Claudio Di Sipio, Phuong T. Nguyen, Riccardo Rubei. Development of recommendation systems for software engineering: the CROSSMINER experience, Accepted for publication at the Empirical Software Engineering Journal (2021) preprint https://arxiv.org/abs/2103.06987
  • 66. 66 Requirement elicitation phase: main challenge Clear understanding of the needed recommendation systems: • Understanding the functionalities that are expected from the final users of the envisioned recommendation • You might risk spending time on developing systems that are able to provide recommendations, which instead might not be relevant and inline with the actual user needs.
  • 67. 67 Requirement elicitation phase: main challenge Applied solution – We implemented demo projects that reflected real-world scenarios – Explanatory context inputs and corresponding recommendation items that the envisioned recommendation systems should have been able to produce.
  • 68. 68 Development phase: main challenge Clear awareness of existing recommendation techniques – Over the last decades, several recommendation systems have been developed by both academia and industry – It is crucial to have a clear knowledge of the possible techniques and patterns that might be employed to develop new ones – Since the solution space is extensive, comparing and evaluating candidate approaches can be a very daunting task
  • 69. 69 Development phase: main challenge Applied solution – Significant effort has been devoted to analyze existing approaches that might have been used as starting points. Data Preprocessing Capturing Context Producing Recommendations Presenting Recommendations
  • 70. 70
  • 71. 71 Evaluation phase: main challenge There is no golden rule for evaluating all possible recommendation systems due to their intrinsic features as well as heterogeneity – Which evaluation methodology is suitable? – Which metric(s) can be used? – Which dataset is eligible/available for evaluation? – Which baseline(s) can be compared with?
  • 72. 72 University of L'Aquila CROSSMINER Lisbon Meeting, 27-28 February 2018 Evaluation phase: Ten-fold cross validation The dataset was divided into ten equal parts, so-called folds The validation has been conducted in ten rounds For each round, nine folds are used as training data, and the remaining fold is used as testing data
  • 73. 73 Evaluation phase: some CROSSMINER facts
  • 76. 76 Lessons learned User scepticism: target users might be sceptical about the relevance of the potential items that can be recommended. Quality of data: importance of having the availability of big data and high- quality data for training and evaluation activities – The definition of data quality cannot be given in general, and it very much depends on the particular application of interest Baseline availability: Not always it is possible to reuse tools and data of the identified baselines – In our case, k-fold cross evaluation came at rescue – Only for CrossSim we reimplemented the related tools
  • 77. 77 Lessons learned In the case of the FOCUS evaluation, one of the considered datasets was initially consisting of 5,147 Java projects retrieved from the Software Heritage archive To comply with the requirements of the baseline, we first restricted the dataset to the list of projects that use at least one of the considered third- party libraries. To comply with the requirements of FOCUS, we restricted the dataset to those projects containing at least one pom.xml file Because of such constraints, we ended up with a dataset consisting of 610 Java projects – we had to create a dataset ten times bigger than the used one for the evaluation
  • 78. 78
  • 79. 79 What’s next Adversarial Machine Learning – Manipulating training data to perturb recommendations – Understanding attacks to recommender systems – Finding decent countermeasures
  • 80. 80 What’s next Dealing with time-series data in Software Engineering with deep learning – Recommending third-party libraries update for Android apps – Predicting code insertion for LSP based notations, e.g., Visual Studio Code, Theia – Predicting model fragment insertion for GLSP based notations, e.g., EMF cloud, Sprotty for visual language
  • 82. 82 Claudio Di Sipio, Davide Di Ruscio, Phuong T. Nguyen: Democratizing the development of recommender systems by means of low-code platforms. MODELS Companion 2020: 68:1-68:9
  • 83. 83
  • 84. 84
  • 85. 85 Some additional links - http://www.ossmeter.org - http://www.crossminer.org - http://www.eclipse.org/scava
  • 86. 86 Main references (1/3) ● Juri Di Rocco, Davide Di Ruscio, Claudio Di Sipio, Phuong T. Nguyen, Riccardo Rubei, “Development of recommendation systems for software engineering: the CROSSMINER experience,” Empirical Software Engineering (EMSE), 2021, pre-print https://arxiv.org/abs/2103.06987 ● Phuong T. Nguyen, Juri Di Rocco, Claudio Di Sipio, Davide Di Ruscio, Massimiliano Di Penta “Recommending API Function Calls and Code Snippets to Support Software Development,” IEEE Transactions on Software Engineering (TSE), 2021, ISSN: 1939-3520, DOI: 10.1109/TSE.2021.3059907 ● Phuong T. Nguyen, Juri Di Rocco, Davide Di Ruscio, Massimiliano Di Penta, “CrossRec: Supporting Software Developers by Recommending Third-party Libraries,” Journal of Systems and Software (JSS), 2020, ISSN: 0164-1212, DOI: 10.1016/j.jss.2019.110460 ● Phuong T. Nguyen, Juri Di Rocco, Riccardo Rubei, Davide Di Ruscio, “An Automated Approach to Assess the Similarity of GitHub Repositories,” Software Quality Journal (SQJ), 2020, ISSN: 0963-9314, DOI: 10.1007/s11219-019-09483-0
  • 87. 87 Main references (2/3) ● Andrea Capiluppi, Davide Di Ruscio, Juri Di Rocco, Phuong T. Nguyen, Nemitari Ajienka, “Detecting Java Software Similarities by using Different Clustering Techniques,” Information and Software Technology (IST), 2020, ISSN: 0950-5849, DOI: 10.1016/j.infsof.2020.106279 ● Riccardo Rubei, Claudio Di Sipio, Phuong T. Nguyen, Juri Di Rocco, Davide Di Ruscio, “PostFinder: Mining Stack Overflow posts to support software developers,” Information and Software and Technology (IST), 2020, ISSN: 0950-5849, DOI: 10.1016/j.infsof.2020.106367 ● Phuong T. Nguyen, Juri Di Rocco, Davide Di Ruscio, Lina Ochoa, Thomas Degueule, Massimiliano Di Penta, “FOCUS: A Recommender System for Mining API Function Calls and Usage Patterns,” In Proceedings of the 41st International Conference on Software Engineering, ICSE 2019, DOI: 10.1109/ICSE.2019.00109
  • 88. 88 Main references (3/3) ● Phuong T. Nguyen, Juri Di Rocco, Davide Di Ruscio, Alfonso Pierantonio, Ludovico Iovino, “Automated Classification of Metamodel Repositories: A Machine Learning Approach,” MODELS 2019, DOI: 10.1109/MODELS.2019.00011 ● Phuong T. Nguyen, Juri Di Rocco, Davide Di Ruscio: “Enabling heterogeneous recommendations in OSS development: what’s done and what’s next in CROSSMINER” In Proceedings of the 23rd Int. Conf. on Evaluation and Assessment on Software Engineering, EASE 2019, DOI: 10.1145/3319008.3319353 ● Phuong T. Nguyen, Juri Di Rocco, Riccardo Rubei, Davide Di Ruscio, “CrossSim: exploiting mutual relationships to detect similar OSS projects,” In Proceedings of the 44th Euromicro Conference on Software Engineering and Advanced Applications (SEAA 2018), ISBN: 978-1-5386-7383-6, DOI: 10.1109/SEAA.2018.00069
  • 89. 89 Thanks Juri Di Rocco, Claudio Di Sipio Phuong T. Nguyen, Alfonso Pierantonio, Riccardo Rubei, for some of the used slides