In January 2018, four Software Engineering research groups located in different Belgian Universities launched a five year research project to nurture the software ecosystems of the future. We assembled a diverse team of about a dozen researchers and embarked on an exciting journey leading to a rich and diverse suite of papers, tools and datasets. Halfway into the project the corona pandemic intervened, but despite several months of lockdown, we succeeded in increasing inter-university collaboration. In this paper we share our achievements so that the BENEVOL community may benefit from our experience.
1. Nurturing the Software Ecosystems of the Future
Achievements of an Inter-University Research Project
Serge Demeyer, Tom Mens, Coen De Roover & Anthony Cleve
secoassist.github.io
@seco-assist
2. Duration: 4 years (2018-2022)
Budget: 2,4 million euros (150k per partner per year)
An "Excellence of Science" research project
3. Duration: 4 years (2018-2022)
Budget: 2,4 million euros (150k per partner per year)
An "Excellence of Science" research project
5. SECO-ASSIST Research Goals
Today, over 80% of the software used in any IT product or service is open source
Societal challenge
Protect the society of the risks and dangers of an increasing dependence on software
ecosystems
Fundamental goals
Study and understand the socio-technical characteristics of software ecosystem health,
quality and sustainability over time
Predict/assist ecosystem evolution to increase long-term sustainability
Applied goal
Propose automated tools to help software development communities in increasing their
productivity, interaction, quality and resilience over time
6. SECO-ASSIST Research Goals
Improve social health
Retain key project contributors and attract new ones
Predict abandoners and suggest replacements
Identify toxic contributors
Ensure sufficient team diversity
7. Improve technical health
Better software tests, taking into account the software
dependencies, to reduce bugs and security leaks
Increase productivity and quality by using reusable
software libraries
Increase software maintainability by supporting
software upgrades and migration to new technologies
Improve interactions and co-evolution between data-
intensive software and their database(s)
SECO-ASSIST Research Goals
14. Socio-technical analysis of software contributor communities
evolution and impact of socio-technical congruence in software packaging ecosystems [ESEC/FSE 2019]
analysis of pull request comments in GitHub repositories [BENEVOL 2019]
probabilistic forecasting model to predict future activity of software project contributors [JSS 2020]
Detecting and analysing bot usage in development projects
identification of key characteristics exhibited by bot activities [BotSE 2020]
ML-based technique for detecting bots based on the repetitiveness of their commenting activity [JSS 2021]
study on the prevalence of bots as contributors in GitHub projects [IEEE Software 2022]
Improving development workflow automation
longitudinal study on the use and evolution of Continuous Integration tools in GitHub [SANER 2022]
large-scale quantitative analysis of the GitHub Actions ecosystem [ICSME 2022]
Studying variant projects in software families
study of the prevalence and importance of project forking in GitHub [BENEVOL 2020, EMSE 2022].
motivations for launching variants and impediments to maintaining the co-existing projects [SANER 2022]
study to quantify the extent of the sub-optimal maintenance in software families [ESEC/FSE22]
Main results – social networks & development workflows
15. Identifying inadequate test suites
mutation coverage to measure the strength of a test suite
recommendation of extra asserts to make the suite stronger [VST 2020]
Test amplification
first demonstration of the feasibility of test amplification for dynamically typed languages
SmallAmp – a tool to strengthen test suites within the Pharo Smalltalk ecosystem [EMSE 2022]
AmPyfier – first tool to strengthen test suites within the Python ecosystem [JSEP 2022]
Test transplantation
use tests from dependent projects to increase the coverage of base packages [EASE 2022]
use test slicing to reconstruct the appropriate object states when transplanting tests [SCAM 2022]
Main results – software testing
16. Release & implementation recommendations for library contributors
target = Ansible Galaxy ecosystem (reusable Infrastructure-as-Code libraries)
automated version increment recommendation (minor, major, patch) [SCAM 2020, JSS 2021, MSR 2021]
detection of 6 novel code smells related to Ansible’s semantics [MSR 2022]
Selection recommendations for library users
helping developers choose a library within vast ecosystems [SoHeal 2020, SCAM 2020, SANER 2022]
Instantiation recommendations for framework users
graph-based mining of frequent framework instantiation patterns [SANER 2019]
capturing the interplays between multiple related instantiation actions [SCAM 2022]
Dependency recommendations for library contributors
quantifying the problem of outdated dependencies [ICSME 2018, JSEP 2019, SANER 2019]
quantifying the outdatedness of packages pre-installed in DockerHub images [SCP 2021, EMSE 2021]
analyzing the adherence to semantic versioning [TSE 2019, SoHeal 2020, SCP 2021]
assessing the impact of security vulnerabilities [MSR 2018, EMSE 2022]
studying the practice of backporting fixes (including security patches) to older releases [TSE 2022]
Main results – software reuse
17. Static detection and analysis of SQL bad smells
static detection of bad smells in SQL queries [ICSE 2018]
prevalence and evolution of SQL code smells in data-intensive open source systems [MSR 2020]
Empirical studies on data-intensive systems
analyzing self-admitted technical debt in database access code [EMSE 2022]
investigating the (joint) use of data models and technologies [ER 2021]
Modeling, manipulating and evolving multi-database systems
HyDRa – a conceptual framework to design and manipulate hybrid polystores [ER 2021]
… and to ease their evolution [SANER 2022, BENEVOL 2022]
performance-based recommendation of polystore schema changes [BENEVOL 2022, ER 2022]
automated query adaptation to preserve system consistency [SCAM 2020]
Analyzing database-related testing practices
state-of-the-practice in testing database manipulation code [CAiSE 2021]
taxonomy of best practices for testing database code [Information Systems 2022]
Main results – database interactions
18. Open Source Tools (1)
BoDeGHA: A command-line tool to identify development bots in GitHub repositories by analysing pull request
and issue comments
https://github.com/mehdigolzadeh/BoDeGha
BoDeGiC: An (open source) command-line tool to identify bots in GitHub repositories by analysing git commit
messages
https://github.com/mehdigolzadeh/BoDeGiC
SQLInspect: A static SQL analyzer with plug-in support for Eclipse to inspect database usage in Java
applications
https://bitbucket.org/csnagy/sqlinspect
GAP: a command-line tool for forecasting future commit activity of contributors involved in software projects
distributed through git
https://github.com/AlexandreDecan/gap
ConPan: an open source command-line tool to inspect Docker containers
https://github.com/neglectos/ConPan
19. SmallAmp: a test amplification tool in Pharo Smalltalk to create new test methods based on manually written
ones to increase mutation coverage
https://github.com/mabdi/small-amp
Small-mince: A tool to slice tests in Pharo Smalltalk
https://github.com/mabdi/small-mince
PaReco: a tool to detect missed opportunities and effort duplication in ecosystems
https://github.com/KadjelRamkisoen/PaReco
Continuous Integration Antipattern Analyzer: a command line tool to analyze CI workflows in git repositories
https://github.com/FreekDS/CIAN
portion: a Python library (with 300+ stars on GitHub) providing data structures and operations to create,
manipulate and query disjunctions of intervals of any comparable objects and interval sets out of the box
https://github.com/AlexandreDecan/portion
SISMIC: a Python library providing a tool suite to define, simulate, execute and test statecharts, supporting test-
driven development, behaviour-driven development, design by contract, and property statecharts to monitor
violations of behavioural properties during statechart execution https://github.com/AlexandreDecan/sismic
Open Source Tools (2)
20. MUTAMA: a tool recommending MVNRepository tags for a given Java library
https://github.com/cvelazquezr/MUTAMA
RESICO: a tool for resolving the simple names of API types in incomplete code snippets (e.g., from Stack
Overflow) to their fully-qualified name https://github.com/cvelazquezr/RESICO
SCARE: a tool for extracting the structural changes between two releases of an Ansible role published on the
Ansible Galaxy ecosystem
https://github.com/ROpdebee/SCARE
LiFUSO: a tool for enumerating library features from its Stack Overflow posts
https://github.com/softwarelanguageslab/lifuso
HyDRa: a framework for hybrid polystore modeling and manipulation
https://github.com/gobertm/HyDRa
npmgraph: A tool for checking license compatibilities for npm packages
https://github.com/IlyasMakari/npmgraph.an
https://zenodo.org/record/5913761
Open Source Tools (3)