2. Overview
• Background
• Exposition: Sociology of Science
• Broad generalizations about science
• Example: FLOSS Research
• Little science context for eScience research
• Expectations: What next?
http://www.flickr.com/photos/pmtorrone/304696349/
3. My Background
• BA: Maths with economics
• Nonprofit & IT industry work
• Adult literacy, nonprofit management support,
professional theatre
• Web analytics
• MSI: Human-computer interaction,
complex systems & network science
• PhD: Information science & technology
4. Science
• Systematic investigation for the production of knowledge
• Scientific method emphasizes reproducibility
• Not all phenomena are reproducible...
• Many categories
• Experimental, applied, social, etc.
• Categories are not mutually exclusive
http://www.flickr.com/photos/radiorover/419414206/
5. Paradigms & Revolutions
• Kuhn - Laws, theories, applications & instrumentation that create
coherent traditions of scientific research
• Paradigms help us direct
our research, but limit
our view of the world
• New technologies can
lead to scientific revolutions
by revealing anomalies
http://www.flickr.com/photos/weichbrodt/644302381/
6. Normal Science
• Kuhn - “normal science” is research based on broadly accepted scientific
paradigms
• Shared paradigms are based on rules
and standards for scientific practice
• Key requirement: agreement on
focus and conduct of research
• Ǝ(Grand Challenges)|Discipline
http://www.flickr.com/photos/themadlolscientist/2421152973/
7. Big Science
• de Solla Price - “Big Science” is...
• Inherently paradigmatic
• Always normal science
• Produces detailed insights into
the minutiae of phenomena
studied in the paradigm
http://www.flickr.com/photos/31333486@N00/1883498062/
8. Pre-paradigmatic Science
• Paradigms require agreement on...
• Epistemology
• Ontology
• Methodology
• Most social sciences are pre-paradigmatic
• Primarily exploratory research
• Very little replication http://www.flickr.com/photos/askpang/327577395/
9. Little Science
• de Solla Price - “Little Science” is a
romanticized precursor to Big Science,
featuring lone, long-haired geniuses
misunderstood by society, etc.
• If it’s not Big Science, it’s Little Science
• Pre-paradigmatic and fraught with ambiguity
• Often fundamentally exploratory
• Epistemological/theoretical/methodological
divergence among researchers
http://www.flickr.com/photos/mrjoax/2548045246/
10. Social Science
• Social science is real science: the goal is systematic knowledge production
• Focuses on the study of the social life of human groups and individuals
• IMHO, fundamentally more difficult than
“hard” sciences due to infinite
complexity of social phenomena
• Replicability is a major challenge
with respect to scientific method
• Not all social science can or should
aspire to replicability
http://www.flickr.com/photos/smiteme/2379629501/
11. Normalizing Science
• Becoming a normal science requires community and convergence
• Ǝ(community) != Ǝ(agreement)
• Establishing grand challenges and
methods are primary tasks
of normalizing
• Resistance to change is pervasive
http://www.flickr.com/photos/9036026@N08/2949211479/
12. Scientific Collaboration
• Collaboration requires common focus, if not also epistemology and ontology
• Challenging enough in normal sciences
• Harder in pre-paradigmatic research
• Economics: systemic disincentives to
collaborate, versus potential benefits
and ideals of science
http://www.flickr.com/photos/richardsummers/542738965/
13. Big Science Collaboration
• LHC, CERN, etc.
• Thousands of collaborators
• Complex but coordinated,
at least somewhat centralized
• Requires shared goals and resources,
plus (lots of) communication
• Only happens in normal sciences
http://www.flickr.com/photos/8767020@N08/531355152/
14. Little Science Collaboration
• A Professor & a grad student, give or take
• Localized goals and resources
• -> localized research practices
• Small research teams
• Fundamentally difficult to achieve
consensus that allows larger groups
• Restricts the ability to obtain funding
and undertake ambitious projects
http://www.flickr.com/photos/lamazone/2735939345/
15. Scientific Collaboration Requirements
• Shared goals
• Establishes focus of research
• Shared research resources
• Both social and artifactual
• Social aspects include
training and community
socialization
we can has share?
http://www.flickr.com/photos/ryanr/142455033/
16. Historical Research Artifacts
• Letters, Books, Journals, Lectures
• Also technologies: methods, instrumentation
• Sharing?
• Recordkeeping is not always
a researcher’s main priority
• Without records, there’s not
much to share except the
research outputs
http://www.flickr.com/photos/smailtronic/1535870363/
17. Today’s Research Artifacts
• Large scale datasets, scripts, software, workflows, papers, images, video,
audio, annotations, ephemera, web sites...
• “Research objects” -
bundling all the pieces together
• Hybrids of boundary objects
and touchstones
• Technologies -> scientific revolution!
• Open science
http://www.flickr.com/photos/smiteme/2379630899/
18. Example: FLOSS Research
• Phenomenological & interdisciplinary
• Software engineering,
Information Systems,
Anthropology,
Sociology,
CSCW,
etc...
• Ethos
• (Idealistic) combination
of open source values
and scientific values
http://www.flickr.com/photos/themadlolscientist/2542236565/
19. FLOSS Phenomenon
• Free/Libre Open Source Software
“Free as in speech, free as in beer” - liberty versus cost
• Distributed collaboration
to develop software
• Volunteers and sponsored
developers
• Community-based model
of development
http://www.flickr.com/photos/prawnwarp/541526661/
20. Typical FLOSS Research Topics
• Coordination and collaboration
• Growth and evolution (social and code)
• Code quality
• Business models and firm involvement
• Motivation, leadership, success
• Culture and community
• Intellectual property and copyright http://www.flickr.com/photos/eean/519258881/
21. What we study @ SU
• Social aspects of FLOSS
• What practices make some distributed work teams more effective than
others?
• How are these practices developed?
• What are the dynamics through which self-organizing distributed teams
develop and work?
22. Sharing FLOSS Research Artifacts
• Community: Small but growing, maybe around 400 researchers worldwide,
with lively face-to-face interaction but relatively low listserv activity
• Data: Lots of it, and readily available, though often difficult to use for several
reasons
• Analyses and tools: Not quite as
easy to get, but there if you can
find them
• Papers: Repositories are as yet
underdeveloped, but efforts are
underway
http://www.flickr.com/photos/12698507@N08/2762563631/
23. FLOSS Research Community
• Handful of small research groups, mostly in UK & Europe
• Most often found in Software Engineering departments
• International conferences
targeted to academics,
developers, or both
• OSS, ICSE, FOSDEM, etc.
• IFIP WG 2.13
http://www.flickr.com/photos/steevithak/2883218362/
24. FLOSS Research Data
• Data sources include interviews, surveys, and ethnographic fieldwork
• Digital “trace” data: archival, secondary,
by-product of work, easy but hard
• Repositories
• Hosting “forges” like SourceForge,
FreshMeat, RubyForge, etc.
• RoRs: Repositories of Repositories
• Data sources for research
25. We Built It...
• Motivations
• Stop hammering forge servers, getting entire campus IPs blocked...
• Stop reinventing the wheel!
• Adoption
• Shared data sources
seeing increasing use
• Next step is harder:
sharing tools and workflows
http://www.flickr.com/photos/circulating/997909242/
26. RoRs: FLOSSmole
• Multiple PIs @ Syracuse, Elon, & Carnegie Mellon
One grad student @ SU (me), a couple of undergrads @ Elon
• Public access to 300+ GB data on
• 300K+ projects from 8 repositories
• Flat files & SQL datamarts
• Released via SF & GC
• 5 TB allotment on TeraGrid @ SDSC
27. RoRs: FLOSSmetrics
• Produced by LibreSoft with academic and corporate partners
• Public access to data for 2800+ projects
• Analyzed & raw data from CVS, email, trackers
• Tools for:
• calculating code metrics
• parsing trackers
• parsing email lists
28. RoRs: SRDA
• SourceForge Research Data Archive
• One PI @ Notre Dame University
• One massive 300 GB+ SQL db of monthly dumps from SourceForge
• Original obtuse structure,
regular table deprecation,
some documentation
• Gated access: researchers only,
condition of data release from SF
29. RoRs: Emerging Sources
• Ultimate Debian Database (UDD)
• 300 MB compressed Postgres DB,
produced by Debian community
• Planning to add to FLOSSmole
31. FLOSS Research Papers
• First, there was opensource.mit.edu
• They no longer maintain it, and gave us the data
• Work-in-progress working papers
repository at FLOSSpapers.org
• Essential viability problem is that
repositories require long-term
stewardship...
• ...which requires long-term
commitments of funding and
personnel, not just volunteers
32. FLOSS Research Collaboration
• Multiple partners involved in producing FLOSSmole & FLOSSmetrics
• Federated data sources by choice,
starting to develop ontologies
• As yet, a Little Science domain
• Cross-institutional collaboration
poses many challenges
• Usual difficulties magnified by
general lack of resources, both
financial and human
33. Latest Initiatives
• Resource-oriented
• Expanding resources: data, research artifacts, and pedagogical materials
• DOIs: 10.4118/*
• Semantic data
interoperability
• Community-oriented
• FLOSShub.org
34. Evangelizing eScience
• Made presentations at OSS conferences: well received, but hard to make
converts for several reasons
• Tried to get other research group members to use Taverna: learning overhead
is too high for most
• Submitted a paper on eScience
to an IS conference: rejected
because reviewers were unable
to adequately evaluate eScience
as a topic, as it’s too unfamiliar
• Currently just doing our work this
way, as an exemplar
http://www.flickr.com/photos/naezmi/2418745377/
35. Barriers to Uptake
• Lack of agreement in research focus, theory, methods; researcher isolation
• Bimodal distribution of requisite skills
• “I can’t possibly do that! I can’t code!”
• “Why bother? I can code my own.
You should too; just use Python.”
“Overheard” on Twitter:
Friend #1: i HATE that openoffice automatically took
over my "open with..." defaults.
Friend #2: @Friend #1 <opensourcedeveloper> If you
don't like it, then why don't you submit code to change
the behavior!? </opensourcedeveloper>
http://www.flickr.com/photos/noner/1739876378/
36. What I had to learn to get this far
• Taverna • A little bit of OWL, RDF, & SPARQL
• A lot more Unix terminal & XML • I would not have taken this on if I
had known what was in store, but
once I got started, I was hooked
• Relational DB management & SQL
• More R, plus packages and
dependency management
• Java & Eclipse - just enough to
write my own Beanshells
• SVN & SSH
http://www.flickr.com/photos/sashala/292868436/
37. Sociotechnical Engineering
• Tools are part of the solution, thanks to brilliant CS and SE people
• Social elements are the true barrier
• Awareness of methods and
benefits
• Incentive systems
• Resistance to change
(paradigms again)
• Proof of concept is difficult
http://www.flickr.com/photos/pinprick/3117108495/
38. Using Taverna for Little eScience
• Implementing analysis is usually easy
• Data handling is almost always hard
• All data are in SQL databases, with consistent IDs
• Lots of data manipulation is required
• Avoiding web services as much as possible
• Infrastructure and resources are limited
• Benefit is truly questionable: AFAIK, I am 50% of the user base...
39. Example: Our Recent Research
• Estimating user base and potential user interest in FLOSS projects
• Based on common release-and-download patterns
• Proxy for project success, a common dependent variable
Area under Potential user
curve is active experimentation Active user base
users updating growth (good growth
publicity?)
downloads
Version 0.5 Version 0.6 Version 0.7
44. Interpretation
• Taverna is not a “normal” open source project
• Speaking tours, tutorials, articles, and other events influence downloads
• What this demonstrates...
• Care is needed with quantitative measures
• Not all open source projects are the same
• Taverna users are just as reactive as any
http://www.flickr.com/photos/pagedooley/2121472112/
45. Where next?
• Adoption is a long-term agenda, as changing social practices doesn’t happen
overnight
• For FLOSS research and our disciplinary communities
• We will keep doing our work this way,
and hope to draw in others
“Won’t you come out and play?”
http://www.flickr.com/photos/atiq/2658884520/
46. Thanks!
• Credits where they are due
• Kevin Crowston, my advisor
• James Howison, my collaborator
• Everett Wiggins, my husband