Recombination DNA Technology (Nucleic Acid Hybridization )
Scientific Software - what happens after the grant?
1. Sustaining scientific infrastructures:
transitioning from grants to peer production
James Howison
School of Information
University of Texas at Austin
2 September 2016
@jameshowison
(slides on slideshare, see twitter for link)
This material is based upon work supported by the US National Science Foundation under
Grant Nos. SMA- 1064209 (SciSIP), OCI-0943168 (VOSS) and ACI-145348 (CAREER).
2. Supporting Scientific software
after grants run out
• What happens when the grant ends?
– It’s hard, hard work to keep the code from
inevitable “bit-rot”
@jameshowison
5. Open projects are not like grants
1. Governance
2. Collaboration infrastructures
3. Contribution processes
4. Service center vs. Base for community
“open sourcing” means full-on
sociotechnical change
@jameshowison
6. A literature on transfer to open?
• Copious literature on commercialization,
“Technology Transfer” but not communities
• Happily there are promising literatures
– Studies of open source and online communities
(Resnick, Crowston, Wiggins, Kittur, Kraut, Lampe, Ellison, …)
– Studies of scientific practice
(Palmer, Borgman, Vertesi, Edwards, Olsons, Finholt, Lee/Bietz,
Østerlund, Sawyer, Tapia, Ludders, …)
– Studies of infrastructural work
(Bowker, Jackson, Vertesi, Ribes, …)
@jameshowison
7. How can scientific software projects successfully
transition from grant support to thriving peer
production communities?
Research Design:
1. Theoretically sampled case studies
1. Longitudinal panel study
@jameshowison
8. Questions for each case:
How did they succeed or fail in building peer
production?
– What actions were taken to change the project?
– How did routines in the project change as a
result?
– What conditions are relevant to the success of
those actions in causing change?
@jameshowison
9. Sampling success and failure
• Very hard to have people talk about failures
– Records are often unavailable
– Constant problem in studies of open source
• Panel study offers help here
– Enroll early, before outcome clear
– Build trust, chart course, keep records
– Selected the NSF SI2 funding program
(program officer support)
@jameshowison
10. Panel Study setup
• SI2 program contributed to over 350 grants
• Three step qualitative content analysis:
1. Did the grant intend to create software
2. What documents (URLs, Workshop reports, or
Publications) are available?
3. Read these, apply coding scheme
@jameshowison
11. Content analysis categories
Code Description
Project Presents
Separate From Grant
Does the grant support the project (e.g., pre-existing), Or is the
project only there because of the grant
inviteToContribute
contributionProcess
Is there an explicit invitation for outsiders to contribute? Is
there a process for taking contributions?
highlightsPublication e.g., Does the project have a “publications tab”
creditsNonPI
Contributors
Are only the PIs credited “the PIs and their teams” or a wider
group?
associatedRepository
CodeAvailable
license
Is code available? Is it openly hosted? Where? Under what
license?
Collaborative setup
(wiki, bugtracker)
Online meetings?
What set of collaborative tools are they using?
Offline meetings Does the project organize offline meetings, what kinds (user
workshop, hackathon).
12. Build dataset over time
• Training new graduate student on scheme
– May involve additional students over time
• Intend to code ~5 projects a weekday for two
years
– 300 projects, 250 weekdays in year, 5 projects a
day, 2 coders, assume some missed days!
– ~5-10 observations of each project a year
• Also analyze repositories, where available.
• Adding content analysis codes over time
@jameshowison
13. Case Method: Sampling
@jameshowison
Use- context diversity
Users
Science project
Generally unreachable area
Domain
Platform
General purpose
Unlikely region
Individual
low high
Few
Many
ytENZO
Eclipse
PTP
OODT/
Airavata
14. Case Method: analysis
• Identify work episodes
– Ground interviews in specific production work.
– Source-code repositories help immensely
– “Digital trace ethnography” (Ribes and Geiger)
• Identify socio-technical changes that divide
project into stages
– Investigate actions that precipitated changes
• Project narratives with illustrative vignettes
@jameshowison
16. ENZO pilot study
Data:
• 5 interviews, so far (thanks Eunyoung Moon!)
• Publications, websites, workshop websites,
source code repositories
• Analysis:
– Creation of timeline
– Identification of episodes and 4 project phases
(with their precipitating events)
@jameshowison
17. @jameshowison
• No central base to which changes are coming and going
• Copy and pasting features across personal branches
• Single lab
18. @jameshowison
• ENZO lab reforms as “Service Center” (grant)
• Mainline branch internally, releases externally
• Little expectation of contributions coming back in
• “Friendly user” labs internally functioning like “early days”
19. The “Week of Code”
• Director of external lab (former post-doc) has
new job at Stanford (with startup funds!)
• Learns of various versions through
conversations at conferences and reviewing(!)
• Focus is on collaboration infrastructure, not
governance.
• Begin with the code of those not present
@jameshowison
20. @jameshowison
• Central branch to which both core and outsiders contribute
• Development continues separately in external labs
• Called “Wild West” by participants, autonomy concerns.
21. @jameshowison
• Introduction of “code revision” (pull requests)
• External lab members on similar footing to Core members
• Review helps members not “step on each other’s work”
22. Change
• What hasn’t changed:
– Motivations (code is side-effect of scientific
inquiry, papers first, code second), no commercial
value
• Challenges to change
– Leadership’s emotional connection, difficulty of
passing on leadership.
– Giving up autonomy (being “blocked” in one’s
work)
@jameshowison
23. What worked
• Always: collaboration technology before
governance (contra “Collaboration Readiness”
(Olson et al.) TORSC?).
• Social proof: visible action in public
• Inspiration from open source
• Working alongside, rather than with.
Superposition rather than Teamwork.
@jameshowison
24. Additional CAREER elements
• Teaching course on online communities
– Incorporating more on managing software
projects in science
• Contributing modules to Software Carpentry
– 2-3 day workshops with graduate students
– Enough command line, python, SQL to get them
working
– I’m going to contribute module on contributing to
and running software projects in science
@jameshowison
25. Conclusions
• Software engineering, but in a very specific
context
• Organization of software work but different to
design and testing of methodologies
• Can also link in resource and motivation
situations
• Learning from open source, building
alternative paths alongside commercialization.
@jameshowison
Hinweis der Redaktion
Software is important, but many other examples as well.
“Peer production takeaway, change is substantial, not natural, and not easy.
Goal is to adapt and extend these literatures, building theory and actionable knowledge for practitioners
So, as an experiment to start playing around algorithm involved in AMR … I started creating this, and I also wanted to learn C++. I started creating this code. And that was eventually grown up as ENZO.
PI stitched together support, scrounging from grants, startup funds and the somewhat “fictional” 20 hours a week of graduate students