4. Previous Work
Zhou and Mockus, ICSE 2011
Does the Initial Environment Impact the Future of
Developers?
Low
Minghui Zhou Audris Mockus
School of Electronics Engineering and Computer Avaya Labs Research
Science, Peking University 233 Mt Airy Rd, Basking Ridge, NJ
Key Laboratory of High Confidence Software audris@avaya.com
Technologies, Ministry of Education
Beijing 100871, China
zhmh@pku.edu.cn
ABSTRACT
Software developers need to develop technical and social skills to
be successful in large projects. We model the relative sociality of
a developer as a ratio between the size of her communication net-
work and the number of tasks she participates in. We obtain both
measures from the problem tracking systems. We use her work-
according to an expert developer 1 . One possibility suggested by
a software project manager, is that many developers tend to fo-
cus on the modules they are familiar with, and rarely communicate
outside their narrow circle of colleagues to gain expertise in other
areas. The software engineering literature has investigated the im-
portant role of social and communication aspects in a developer’s
work. They might impact developer productivity (Cataldo et al [2])
sociability
flow peer network to represent her social learning, and the issues
she has worked on to represent her technical learning. Using three and they might affect software quality (Cataldo et al [3]). Further-
open source and three traditional projects we investigate how the more, cognitive scientists have argued that interacting with partners
project environment reflected by the sociality measure at the time is significantly better than learning alone [5]. In other words, the
a developer joins, affects her future participation. We find: a) the developers need both technical and social skills to be capable of
probability that a new developer will become one of long-term and solving critical tasks, though that might present two contradicting
productive developers is highest when the project sociality is low; or at least competing learning goals.
b) times of high sociality are associated with a higher intensity of On the other hand, there may be obstacles for the developers to
new contributors joining the project; c) there are significant dif- achieve socio-technical balance, even when they have a strong mo-
ferences between the social learning trajectories of the developers tivation to cultivate their social and technical trajectories, because
who join in low and in high sociality environments; d) the open the project environment, in particular, the environment at the time
source and commercial projects exhibit different nature in the rela- a developers joins (i.e., the initial environment for the developer),
tionship between developer’s tenure and the project’s environment may have a significant impact on the individual. For example, in
at the time she joins. These findings point out the importance of many offshoring projects, the developers in the offshore location
the initial environment in determining the future of the developers were considered to be incompetent to implement new feature de-
and may lead to better training and learning strategies in software velopment in legacy projects: “I don’t know if people are “climb-
organizations. ing up” (moving from defect fixing to new development) in this
site,” because “initially nobody could get trained by experienced
Categories and Subject Descriptors mentors”, according to an outsourcing manager. Therefore, “the
D.2.8 [Software Engineering]: Metrics—process metrics; D.2.9 offshore team really needs time working with onshore developers
[Software Engineering]: Management—productivity to gain mature practices,” according to the same manager.
General Terms This anecdotal evidence sparked our interest to investigate how
Be/er
the initial environment may impact the developers’ learning trajec-
Measurement, Performance, Human Factors tories, in particular, the achievement of social and technical bal-
Keywords ance. Improving this process may help understand how to increase
Socio-technical balance, initial environment, relative sociality, learn- the number of developers capable of solving critical tasks, to im-
ing trajectory prove the developers’ training, and to facilitate the project’s suc-
cess.
1. INTRODUCTION We have to overcome two challenges to proceed with this inves-
tigation. First, we need to measure the socio-technical balance, sec-
training
The most critical tasks in software projects require “expertise ond, we need to determine how the initial environment affects the
across multiple areas”, however, “there are few staff to choose from” trajectories of developers. In addition to the challenges of measur-
ing the social and technical achievement in general, we also need to
derive these measures from commonly available project data, such
Permission to make digital or hard copies of all or part of this work for as version control system and problem tracking system. Such data
Permission to make digitalisor hard copies of fee provided that copies are
personal or classroom use granted without all or part of this work for are difficult to obtain and even more difficult to interpret. For ex-
personal or classroom use is granted without fee provided and that copies
not made or distributed for profit or commercial advantage that copies are
not made or distributed for profit or commercial advantage and that copies ample, Cataldo et al. [4] compared an MR-induced logical depen-
bear this notice and the full citation on the first page. To copy otherwise, to
bear this notice and the full citation on the first page. To copy otherwise, to dency graph on source code files with a graph induced by instant
republish, to post on servers or to redistribute to lists, requires prior specific
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
permission and/or a fee. 1
ICSE’11,May 21-28 2011, Waikiki, Honolulu , HI, USA
ICSE11, May 21–28, 2011, Waikiki, Honolulu, HI, USA The quotes, including the latter ones, are obtained from the inter-
Copyright 2011 ACM 978-1-4503-0445-0/11/05 ...$10.00
Copyright 2011 ACM 978-1-4503-0445-0/11/05 ...$10.00. views conducted in our former work [20].
271
5. Previous Work
Dagenais et al., ICSE 2010
Moving into a New Software Project Landscape
Barthélémy Dagenais†∗ Harold Ossher‡ , Rachel K. E. Bellamy‡ , Martin P. Robillard† ,
,
Jacqueline P. de Vries‡
School of Computer Science† IBM T.J. Watson Research Center‡
McGill University P.O. Box 704
Montréal, QC, Canada Yorktown Heights, NY 10598
{bart,martin}@cs.mcgill.ca {ossher,rachel,devries}@us.ibm.com
ABSTRACT cess of learning about a project, and how that process unfolds over
When developers join a software development project, they find time. From the perspective of someone helping newcomers set-
themselves in a project landscape, and they must become familiar tle in, the landscape metaphor reveals the need to show them the
with the various landscape features. To better understand the nature commonly-traversed routes, to help them learn to interpret aspects
of project landscapes and the integration process, with a view to im- of the landscape unique to the project, and to introduce them to the
proving the experience of both newcomers and the people responsi- customs of the people who inhabit the landscape. It also suggests
ble for orienting them, we performed a grounded theory study with that if the community wants to be welcoming to newcomers, they
18 newcomers across 18 projects. We identified the main features need to be tolerant of cultural faux-pas, be sensitive to mis-steps
that characterize a project landscape, together with key orientation caused by a newcomer’s lack of understanding, take the time to
aids and obstacles, and we theorize that there are three primary understand why newcomers get lost in their landscape, add readily-
factors that impact the integration experience of newcomers: early interpretable signposts and move them as things change. Such sign-
experimentation, internalizing structures and cultures, and progress posts are especially important at cross-roads—places with choices
validation. where others have tended to get lost. Identifying what counts as a
cross-roads and what characterizes the parts of a project that need
Categories and Subject Descriptors signposts can be aided by studies such as that presented here.
D.2.9 [Software Engineering]: Management Specifically, we were interested in answering three main research
questions: what are the key, prominent features in a project land-
General Terms scape, what orientation obstacles do new team members face, and
Human Factors what orientation aids can be provided? We interviewed 18 develop-
ers and team leaders across 18 projects at IBM during the last year
1. INTRODUCTION to answer these questions.
Software developers working on a project effectively inhabit a Following these interviews, we theorized that there are three
project landscape. They are familiar with its features, such as the main factors that impact how newcomers settle into a project land-
product architecture, the team communication strategies and the de- scape: early experimentation, internalizing structures and cultures,
velopment process, and they know the shortcuts and the commonly- and progress validation. We also identified the landscape features
traveled paths. Newcomers are explorers who must orient them- that newcomers learned while moving into new project landscapes
Mentoring
selves within an unfamiliar landscape. As they gain experience, and we observed how the features facilitated or hindered the new-
they eventually settle in and create their own places within the comers’ integration. When we presented the results of our study to
landscape. Like explorers of the natural landscape, they encounter seven of the participants, they all agreed that the factors accurately
many obstacles, such as culture shock or getting lost without help. represented their experiences as newcomers and that application of
We conducted a qualitative study to better understand what proj- our findings would have eased their integration.
ect landscapes look like and how newcomers explore them. Think- In the past, studies on project integration have been performed
ing of a project as a landscape, and integration of newcomers as with new employees joining their first software development proj-
the process of settling into that landscape, changes what we per- ects [2, 15]. Because these studies were performed with junior and
project
newcomers
ceive to be important and helps us see new ways of aiding new- recently-hired developers, many of the difficulties they encountered
comers. From a newcomer’s perspective, it emphasizes the pro- related to the newness of the corporate culture and the difference
∗ between academic and industrial environments. We were interested
This research was conducted while the author was working at the in understanding specifically the project landscape, independently
IBM T.J. Watson Research Center.
of the circumstances related to the first-time transition of personnel
into an industry environment. To this end, we focused this study
on developers with varying degrees of experience in the field and
highly
desirable
Permission to make digital or hard copies of all or part of this work for within their company who were joining on-going projects in the
personal or classroom use is granted without fee provided that copies are company. We reported preliminary results at a workshop [6].
not made or distributed for profit or commercial advantage and that copies The contributions of this paper include a theory, grounded in em-
bear this notice and the full citation on the first page. To copy otherwise, to pirical data, of how newcomers integrate into a project landscape,
republish, to post on servers or to redistribute to lists, requires prior specific and a characterization of project landscapes as seen by newcomers.
permission and/or a fee.
The landscape features identified are well known; the contribution
ICSE ’10, May 2-8 2010, Cape Town, South Africa
Copyright 2010 ACM 978-1-60558-719-6/10/05 ...$10.00. in this area is the empirical evidence of their impact on integration.
275
6. Characteristics of a Good Mentor
enough
exper;se
about
the
topic
of
interest
for
the
newcomer…
enough
ability
to
help
other
people…
8. Our Contribution
YODA
(Young and newcOmer Developer Assistant)
Approach for Mentors Identification
in Open Source Projects
9. YODA: Two phases
1) Identify Mentors 2) Recommend
in Past Project Mentors
History
What factors can be
used to identify
mentors?
S VN
?
GIT
C VS
10. RQ1: Identifying mentors in past
project history
Similar
problem:
What factors can be used Iden;fying
advisors
in
to identify mentors? academic
collabora;ons
ArnetMiner
(http://arnetminer.org):
popular search engine for
academic
researchers in computer
science
identifies relations between
students and advisors
11. How does ArnetMiner work?
Ranks
pairs
of
researchers
according
to
four
factors:
f1 they
published
many
papers
together
f2 advisor
published
more
than
the
student
f3 advisor
older
than
the
student
f4 student
published
her
first
paper(s)
with
the
advisor
13. Heuristics to identify mentors
F1: Exchanged emails
Is the mentor of
Jim IF Alice
F1
Time When Alice joins
the project
14. Heuristics to identify mentors
F2: overall amount of emails
Is the mentor of
Jim IF Alice
F1
15. Heuristics to identify mentors
F2: overall amount of emails
Is the mentor of F2
Jim IF Alice
F1
F2 >
16. Heuristics to identify mentors
F2: overall amount of emails
Is the mentor of F2
Jim IF Alice
F1
F2 > >
17. Heuristics to identify mentors
F3: age in the project
Is the mentor of
Jim IF Alice
F1
F2 > Time
18. Heuristics to identify mentors
F3: age in the project
Is the mentor of
F3
Jim IF Alice
F1
F2 > Time
F3
19. Heuristics to identify mentors
F4: newcomer “early” emails
Is the mentor of
Jim IF Alice
F1
F2 > Time
F3
F4
-‐
1st
20. Heuristics to identify mentors
F4: newcomer “early” emails
Is the mentor of
Jim IF Alice
F1
F2 > Time First emails by Alice
F3 in the project
F4
-‐
1st
21. Heuristics to identify mentors
F5:
Commits
Is the mentor of
Jim IF Alice
F1
F2 > Time
F3
F4
-‐
1st
22. Heuristics to identify mentors
F5:
Commits
Is the mentor of
F5
Jim IF Alice
F1
F2 > Time When Alice joins
F3 the project
F4
-‐
1st
F5
28. Recommending Mentors
Past
mentors Inspired to the
work on Bug
Triaging by J.
Anvik et al.,
TOSEM 2011
Time
29. Recommending Mentors
Past
mentors Inspired to the
work on Bug
Triaging by J.
Anvik et al.,
TOSEM 2011
Time t0
Alice
30. Recommending Mentors
Past
mentors Inspired to the
work on Bug
Triaging by J.
Anvik et al.,
TOSEM 2011
Time t0
Alice
31. Recommending Mentors
Past
mentors Inspired to the
work on Bug
Triaging by J.
Anvik et al.,
TOSEM 2011
Time t0
Alice
32. Recommending Mentors
Past
mentors Inspired to the
work on Bug
Triaging by J.
Anvik et al.,
TOSEM 2011
Time t0
Alice
33. Recommending Mentors
Past
mentors Inspired to the
work on Bug
Triaging by J.
Anvik et al.,
TOSEM 2011
Time t0
Alice
DICE
SIMILARITY
34. Empirical Study
Goal: analyze data from mailing lists and versioning
systems
Purpose: investigating which factors can be used to
identify mentors
Quality focus: recommend mentors in software
projects
Context: mailing lists and versioning systems of five
software projects:
• Apache, FreeBSD, PostgreSQL, Python and Samba
35. Context
Split into a training set and a test set
Apache FreeBSD PostgreSQL Python Samba
Period
08/2001-03/2002 11/1998-02/2000 10/1998-05/2001 05/2000-05/2001 04/1998-09/2000
(Training set)
Period
04/2002-12/2008 03/2000-10/2008 06/2001-03/2008 06/2001-12/2008 10/2000-12/2008
(Test set)
# of Mentors
(Training set)
19 65 10 28 17
# of
Newcomers 13 33 8 32 33
(Training set)
# of
Newcomers 13 33 7 31 33
(Test set)
36. Research Questions
RQ1 RQ2
How can we To what extent would
identify mentors it be possible to
from the past recommend mentors
history of a to newcomers joining
software project? a software project?
?
37. RQ1: How can we identify mentors from the
past history of a software project?
Pair Score
2.5
2.5
1.5 F1
1.5
F2 >
F3
1.0 F4
-‐
1st
F5
1.0
38. RQ1: How can we identify mentors from the
past history of a software project?
Pair Score
2.5
2.5
Manually
1.5 ✔ validated F1
1.5
F2 >
F3
1.0 F4
-‐
1st
F5
1.0
39. RQ1: How can we identify mentors from
the past history of a software project?
100%#
90%#
Possible
80%#
Configurations
70%#
f1
Precision)
60%#
50%#
F1
40%#
30%# F2 >
20%#
F3
10%#
0%# F4
-‐
1st
18# 19# 20# 21# 22# 23# 24#
Number)of)newcomer0mentor)pairs) F5
40. RQ1: How can we identify mentors from
the past history of a software project?
100%#
90%#
Possible
80%#
Configurations
70%#
f1
+f2+
f3
Precision)
60%#
50%#
F1
40%#
30%# F2 >
20%#
F3
10%#
0%# F4
-‐
1st
18# 19# 20# 21# 22# 23# 24#
Number)of)newcomer0mentor)pairs) F5
41. RQ1: How can we identify mentors from
the past history of a software project?
100%#
90%#
Possible
80%#
Configurations
70%#
f1
+f2+
f4
Precision)
60%#
50%#
F1
40%#
30%# F2 >
20%#
F3
10%#
0%# F4
-‐
1st
18# 19# 20# 21# 22# 23# 24#
Number)of)newcomer0mentor)pairs) F5
42. RQ1: How can we identify mentors from
the past history of a software project?
100%#
90%#
Possible
80%#
Configurations
70%#
f5
Precision)
60%#
50%#
F1
40%#
30%# F2 >
20%#
F3
10%#
0%# F4
-‐
1st
18# 19# 20# 21# 22# 23# 24#
Number)of)newcomer0mentor)pairs) F5 (Baseline)
43. RQ1: How can we identify mentors from
the past history of a software project?
100%# Apache 100%# PostgreSQL
90%# 90%#
80%# 80%#
70%# 70%#
Precision)
Precision)
60%# 60%#
50%# 50%#
40%# 40%#
30%# 30%#
20%# 20%#
10%# 10%#
0%# 0%#
18# 19# 20# 21# 22# 23# 24# 12# 14# 16# 18# 20# 22#
Number)of)newcomer0mentor)pairs) Number)of)newcomer0mentor)pairs)
f1 f1
+f2+
f3 f1
+f2+
f4 f5
(Baseline)
F1 F2 > F3 F4
–
1st
F5
44. RQ1: How can we identify mentors from
the past history of a software project?
100%# Apache 100%# PostgreSQL
90%# 90%#
80%# 80%#
70%# 70%#
Precision)
Precision)
60%# 60%#
50%# 50%#
40%# 40%#
30%# 30%#
20%# 20%#
10%# 10%#
0%# 0%#
18# 19# 20# 21# 22# 23# 24# 12# 14# 16# 18# 20# 22#
Number)of)newcomer0mentor)pairs) Number)of)newcomer0mentor)pairs)
f1 f1
+f2+
f3 f1
+f2+
f4 f5
(Baseline)
F1 F2 > F3 F4
–
1st
F5
45. RQ1: How can we identify mentors from
the past history of a software project?
100%#
90%# Python 100%#
90%#
FreeBSD
80%# 80%#
70%# 70%#
Precision)
Precision)
60%# 60%#
50%# 50%#
40%# 40%#
30%# 30%#
20%# 20%#
10%# 10%#
0%# 0%#
24# 26# 28# 30# 32# 34# 36# 38# 40# 42# 44# 46# 48# 23# 25# 27# 29# 31# 33# 35# 37# 39# 41#
Number)of)newcomer0mentor)pairs) Number)of)newcomer0mentor)pairs)
100%#
90%#
80%#
70%#
Precision)
60%#
50%#
40%#
30%#
20%# Samba
10%#
0%#
30# 32# 34# 36# 38# 40# 42#
Number)of)newcomer0mentor)pairs)
46. RQ1: How can we identify mentors from
the past history of a software project?
100%#
90%# Python 100%#
90%#
FreeBSD
80%# 80%#
70%#
Useful factors for mentor identification 70%#
Precision)
Precision)
60%# 60%#
50%# 50%#
40%# 40%#
30%# 30%#
20%# 20%#
10%# 10%#
F1
0%# 0%#
>
24# 26# 28# 30# 32# 34# 36# 38# 40# 42# 44# 46# 48# 23# 25# 27# 29# 31# 33# 35# 37# 39# 41#
f1
Number)of)newcomer0mentor)pairs) F2
Number)of)newcomer0mentor)pairs)
100%#
0.5*f1
+
0.25*f2
+
0.25*f3
90%# F3
80%#
0.5*f1
+
0.25*f2
+
0.25*f4
70%#
Precision)
60%# F4
-‐
1st
50%#
40%#
30%#
20%# Samba F5
10%#
0%#
30# 32# 34# 36# 38# 40# 42#
Number)of)newcomer0mentor)pairs)
47. RQ2: To what extent would it be possible to
recommend mentors to newcomers joining a
software project?
Top$1$$ Top$2$
110%$
100%$ 100%$
100%$ 94%$
90%$ 85%$
81%$ 82%$
80%$ 77%$
Precision
70%$ 64%$
60%$
50%$
40%$
30%$
30%$ 24%$
20%$
10%$
0%$
Apache$ FreeBSD$ PostgreSQL$ Python$ Samba$
48. RQ2: To what extent would it be possible to
recommend mentors to newcomers joining a
software project?
Top$1$$ Top$2$
110%$
100%$ 100%$
100%$ 94%$
90%$ 85%$
81%$ 82%$
80%$ 77%$
Precision
70%$ 64%$
60%$
50%$
40%$
30%$
30%$ 24%$
20%$
10%$
0%$
Apache$ FreeBSD$ PostgreSQL$ Python$ Samba$
49. RQ2: To what extent would it be possible to
recommend mentors to newcomers joining a
software project?
Top$1$$ Top$2$
110%$
100%$ 100%$
100%$ 94%$
90%$ 85%$
81%$ 82%$
80%$ 77%$
✔
Precision
70%$ 64%$
60%$ YODA makes it possible
50%$
to recommend mentors
40%$
30%$
30%$ 24%$
20%$
10%$
0%$
Apache$ FreeBSD$ PostgreSQL$ Python$ Samba$
60. Perceived importance of mentoring
0%$
Useless#at#all# 0%$
Not#important# 0%$ a
Is very important that
0%$
mentor shares knowledge
with a mentee… 11%$
Neutral# 45%$
56%$
Important# 36%$
33%$
Very#important# 18%$
0%# 10%# 20%# 30%# 40%# 50%# 60%#
Effect#of#mentor# Effect#on#newcomer#
61. What makes a good mentor
Others# 0%$
Project#knowledge# 38%$
Communica4on#skills# 42%$
Experience# 19%$
0%# 20%# 40%# 60%#
62. What makes a good mentor
Others# 0%$
My first mentor
had a very strong and
Project#knowledge#
technical background 38%$
Communica4on#skills# 42%$
Experience# 19%$
0%# 20%# 40%# 60%#