Although science has become an increasingly collaborative endeavor over the last hundred years, only little attention has been devoted to supporting scientific communities. Our work focuses on scientific collaborations that revolve around complex science questions that require significant coordination to synthesize multi-disciplinary findings, enticing contributors to remain engaged for extended periods of time, and continuous growth to accommodate new contributors as needed as the work evolves over time. This paper presents a virtual crowdsourcing community for open collaboration in science processes to address these challenges. Our solution is based on the Semantic MediaWiki and extends it with new features for scientific collaboration. We present preliminary results from the usage of the interface in a pilot research project
A Virtual Crowdsourcing Community for Open Collaboration in Science Processes
1. A Virtual Crowdsourcing Community for
Open Collaboration in Science Processes
1Software Engineering for Business Information Systems, Technical University of Munich
2Information Sciences Institute, University of Southern California
Felix Michel12, Yolanda Gil2, Varun Ratnakar2,Matheus Hauder1
21th Americas Conference on Information Systems 2015
Organic Data Science Framework
http://www.organicdatascience.org/
2. TECHNICAL UNIVERITY OF MUNICH | USC INFORMATION SCIENCES INSTITUTE Felix Michel 2
ApproachIntroductionMotivation Organic Data Science Wiki Evaluation Conclusion
Evolution of the scientific enterprise
Evolution of the scientific enterprise from [Barabasi, 2005] extended with the
ATLAS Detector Project at the Large Hadron Collider [The ATLAS Collaboration, 2012].
Motivation
single-authorship co-authorship large number
co-authors
the community
as author
3. TECHNICAL UNIVERITY OF MUNICH | USC INFORMATION SCIENCES INSTITUTE Felix Michel 3
ApproachIntroductionMotivation Organic Data Science Wiki Evaluation Conclusion
Requirements of Scientific Collaborations
Introduction
Significant organization
and coordination
Maintaining a community
over the longer term
Growing the community
based on unanticipated needs
R1:
R2:
R3:
4. TECHNICAL UNIVERITY OF MUNICH | USC INFORMATION SCIENCES INSTITUTE Felix Michel 4
ApproachIntroductionMotivation Organic Data Science Wiki Evaluation Conclusion
Key objectives are:
I. Self-organization of the work
II. Sustainable on-line communities
III. Open science processes that expose
all tasks and activities publicly
Reducing the coordination effort, lower
the barriers to growing the community
Organic Data Science Framework
Approach
5. TECHNICAL UNIVERITY OF MUNICH | USC INFORMATION SCIENCES INSTITUTE Felix Michel 5
ApproachIntroductionMotivation Organic Data Science Wiki Evaluation Conclusion
Social Design Principles
Selected social principles from [Kraut and Resnick 2012] for building successful online communities that can be applied to Organic Data Science.
A1: Carve a niche of interest, scoped in terms of topics,
members, activities, and purpose
A2: Relate to competing sites, integrate content
A3: Organize content, people, and activities into
subspaces once there is enough activity
A4: Highlight more active tasks
A5: Inactive tasks should have “expected active times”
A6: Create mechanisms to match people to activities
B1: Make it easy to see and track
needed contributions
B2: Ask specific people on tasks of interest to them
B3: Simple tasks with challenging goals
are easier to comply with
B4: Specify deadlines for tasks,
while leaving people in control
B5: Give frequent feedback specific to the goals
…
B10 …
C1: Cluster members to help them identify
with the community
C2: Give subgroups a name and a tagline
C3: Put subgroups in the context of a larger group
C4: Make community goals and purpose explicit
C5: Interdependent tasks increase
commitment and reduce conflict
D
D1: Members recruiting colleagues is most effective
D2: Appoint people responsible for immediate
friendly interactions
D3: Introducing newcomers to members
increases interactions
D4: Entry barriers for newcomers help
screen for commitment
D5: When small, acknowledge each new member
…
D12 …
B
A C
Approach
Starting communities
Encouraging contributions
through motivation
Encouraging commitment
Dealing with newcomers
6. TECHNICAL UNIVERITY OF MUNICH | USC INFORMATION SCIENCES INSTITUTE Felix Michel 6
ApproachIntroductionMotivation Organic Data Science Wiki Evaluation Conclusion
Best Practices from Polymath and Encode
Selected best practices from the Polymath [Nielsen 2012] project and lessons learned from ENCODE [Encode 2004].
E1: Permanent URLs for posts and comments, so others can refer to them
E2: Appoint a volunteer to summarize periodically
E3: Appoint a volunteer to answer questions from newcomers
E4: Low barrier of entry: make it VERY easy to comment
E5: Advance notice of tasks that are anticipated
E6: Keep few tasks active at any given time, helps focus
F1: Spine of leadership, including a few leading scientists and 1-2 operational project managers,
that resolves complex scientific and social problems and has transparent decision making
F2: Written and publicly accessible rules to transfer work between groups, to
assign credit when papers are published, to present the work
F3: Quality inspection with visibility into intermediate steps
F4: Export of data and results, integration with existing standards
E
F
Approach
Lessons learned from ENCODE
Best practices from Polymath
7. TECHNICAL UNIVERITY OF MUNICH | USC INFORMATION SCIENCES INSTITUTE Felix Michel 7
ApproachIntroductionMotivation Organic Data Science Wiki Evaluation Conclusion
The Organic Data Science Framework
Organic Data Science Wiki
8. TECHNICAL UNIVERITY OF MUNICH | USC INFORMATION SCIENCES INSTITUTE Felix Michel 8
ApproachIntroductionMotivation Organic Data Science Wiki Evaluation Conclusion
Welcome Page0
Welcome Page0
Organic Data Science Wiki
A1: Carve a niche of interest, scoped in terms of
topics, members, activities, and purpose
D6: Advertise members particularly
community leaders, include pictures
D7: Provide concrete incentives to early members
E6: Keep few tasks active at any given time, helps focus
[A1, A2,A3,B7,D1, D5, D6, D7, E2, E6, F1, F2, F4]
II. Sustainable On-Line Communities
III. Opening Science Process
9. TECHNICAL UNIVERITY OF MUNICH | USC INFORMATION SCIENCES INSTITUTE Felix Michel 9
ApproachIntroductionMotivation Organic Data Science Wiki Evaluation Conclusion
Task Representation1
Organic Data Science Wiki
Task Representation1
B1: Make it easy to see and
track needed contributions
C3: Put subgroups in the context
of a larger group
E1: Permanent URLs for posts and
comments, so others can refer to them
[A3, A4, A6, B1, B3, B10, C2, C3, C4, C5, E1, F3]
I. Self-Organization
III. Opening Science Process
10. TECHNICAL UNIVERITY OF MUNICH | USC INFORMATION SCIENCES INSTITUTE Felix Michel 10
ApproachIntroductionMotivation Organic Data Science Wiki Evaluation Conclusion
Task Metadata2
Organic Data Science Wiki
Task Metadata2
A5: Inactive tasks should have "expected active times“
A6: Create mechanisms to match people to activities
B4: Specify deadlines for tasks, while leaving people in control
[A4, A5, A6, B1, B2, B4, B5, B6, C1, C2, C5, E5, F3]
I. Self-Organization
II. Sustainable On-Line Communities
III. Opening Science Process
AMCIS
eScience
11. TECHNICAL UNIVERITY OF MUNICH | USC INFORMATION SCIENCES INSTITUTE Felix Michel 11
ApproachIntroductionMotivation Organic Data Science Wiki Evaluation Conclusion
Task Status (Progress Estimation)
PIHM model Documentation
Calibrating …90%
85%
80%
70% 80% 90%
90%
Low-level Task
Low uncertainty in estimation
Estimated by users
90%
90%
90%
Medium-level Task
Medium uncertainty in estimation
Average of its Subtasks
High-level Task
High uncertainty in the estimation
Linear progress based
on start and target date
90%
95%
Task metadata is used
to estimate the progress
and status of tasks
Legend
The abstraction level of a parent
node/task can be equal or higher
but not lower.
Rule:
Organic Data Science Wiki
12. TECHNICAL UNIVERITY OF MUNICH | USC INFORMATION SCIENCES INSTITUTE Felix Michel 12
ApproachIntroductionMotivation Organic Data Science Wiki Evaluation Conclusion
Task Navigation3
Organic Data Science Wiki
Task Navigation3
B1: Make it easy to see and
track needed contributions
C3: Put subgroups in the
context of a larger group
F3: Quality inspection with
visibility into intermediate steps
[B1, B4, B10, C1, C2, C3, C4, C5, F3]
I. Self-Organization
III. Opening Science Process
13. TECHNICAL UNIVERITY OF MUNICH | USC INFORMATION SCIENCES INSTITUTE Felix Michel 13
ApproachIntroductionMotivation Organic Data Science Wiki Evaluation Conclusion
Timeline Navigation6
Organic Data Science Wiki
Timeline Navigation6
A5: Inactive tasks should have "expected active times“
B5: Give frequent feedback specific to the goals
E5: Advance notice of tasks that are anticipated
[A5, B1, B5, E5, F3]
I. Self-Organization
III. Opening Science Process
14. TECHNICAL UNIVERITY OF MUNICH | USC INFORMATION SCIENCES INSTITUTE Felix Michel 14
ApproachIntroductionMotivation Organic Data Science Wiki Evaluation Conclusion
Task Alert7
Organic Data Science Wiki
Task Alert7
B1: Make it easy to see and
track needed contributions
B4: Specify deadlines for tasks,
while leaving people in control
[B1, B4]
I. Self-Organization
II. Sustainable On-Line Communities
15. TECHNICAL UNIVERITY OF MUNICH | USC INFORMATION SCIENCES INSTITUTE Felix Michel 15
ApproachIntroductionMotivation Organic Data Science Wiki Evaluation Conclusion
User Tasks and Expertise9
Organic Data Science Wiki
User Tasks and Expertise9
B2: Ask specific people on
tasks of interest to them
C1: Cluster members to help them
identify with the community
C5: Interdependent tasks increase
commitment a. reduce conflict
[B1, B2, B5, B8, B10, C1, C5]
I. Self-Organization
II. Sustainable On-Line Communities
III. Opening Science Process
16. TECHNICAL UNIVERITY OF MUNICH | USC INFORMATION SCIENCES INSTITUTE Felix Michel 16
ApproachIntroductionMotivation Organic Data Science Wiki Evaluation Conclusion
Task State10
Organic Data Science Wiki
Task State
B1: Make it easy to see and track needed contributions
B5: Give frequent feedback specific to the goals
E5: Advance notice of tasks that are anticipated
[B1, B5, E5, F3]
I. Self-Organization
II. Sustainable On-Line Communities
III. Opening Science Process
10
17. TECHNICAL UNIVERITY OF MUNICH | USC INFORMATION SCIENCES INSTITUTE Felix Michel 17
ApproachIntroductionMotivation Organic Data Science Wiki Evaluation Conclusion
Train New Members11
Organic Data Science Wiki
Train New Members
D3: Introducing newcomers to
members increases interactions
D8: Design common learning
experiences for newcomers
D9: Design clear sequence of
stages to newcomers
[D2, D3, D4, D8, D9, D10, D11, D12, E3, E4]
II. Sustainable On-Line Communities
III. Opening Science Process
11
18. TECHNICAL UNIVERITY OF MUNICH | USC INFORMATION SCIENCES INSTITUTE Felix Michel 18
ApproachIntroductionMotivation Organic Data Science Wiki Evaluation Conclusion
Mapping Features, Objectives and Social Principles
Welcome Page A1, A2, A3, B7, D1, D5, D6, D7, E2, E6, F1, F2, E4
Task Representation A3, A4, A6, B1, B3, B10, C2, C3, C4, C5, E1, E5, F3
Task Metadata A4, A5, A6, B1, B2, B4, B5, B6, C1, C2, C5, F3
Task Navigation B1, B4, B10, C1, C2, C3, C4, C5, F3
Personal Worklist A4, B1, B4, C3
Subtask Navigation B1, B5, B9, B10 C5, F3
Timeline Navigation A4, A5, B1, B5, E5, F3
Task Alert B1, B4
Task Management A3, B3, B10, F3
User Tasks and Expertise B1, B2, B5, B8, B10, C1, C5
Task State B1, B5, E5
Training New Members D2, D3, D4, D8, D9, D10, D11, D12, E3, E4
Social Principles
and Best Practices
Organic Data Science
Features
11
Organic Data Science Wiki
19. TECHNICAL UNIVERITY OF MUNICH | USC INFORMATION SCIENCES INSTITUTE Felix Michel 19
ApproachIntroductionMotivation Organic Data Science Wiki Evaluation Conclusion
Is the Framework Helping Users Organize their Work?
13
33
27
32
11
6
0 1 2 3 4 5
NumberofTasks
(a) Number of ancestors
31
3 2
61
4
1
0 1 2
(b) Num
Tasks
Tasks
Subtask Hierarchies
Evaluation
10 Weeks:
122 Tasks
Task pages accessed
2,900 times
Person pages
accesses 328 times
One paper was
written with the
ODS framework
19,000 log entries
used for evaluation
20. TECHNICAL UNIVERITY OF MUNICH | USC INFORMATION SCIENCES INSTITUTE Felix Michel 20
ApproachIntroductionMotivation Organic Data Science Wiki Evaluation Conclusion
Is the Framework Helping to Create Communities?
100
90
80
70
60
50
40
30
20
10
0
PercentageofTasks
Number of
Different Persons
14%
3%
14%
21%
48%
31T
8T
32T
47T
111T
1%
13%
32%
30%
24%
1T
14T
36T
33T
27T
0%
1%
2%
16%
81%
0T
1T
3T
25T
126T
1%
0%
1%
10%
88%
2T
0T
3T
25T
210T
T = Number of Tasks
How many
tasks are viewed
by more than
one person?
How many tasks
have more than
one person
signed up?
How many tasks
have more than
one person editing
task metadata?
How many tasks
have more than
one person editing
their content?
Evaluation
21. TECHNICAL UNIVERITY OF MUNICH | USC INFORMATION SCIENCES INSTITUTE Felix Michel 21
ApproachIntroductionMotivation Organic Data Science Wiki Evaluation Conclusion
Is the Framework Helping to Open the Science Processes?
Evaluation
Organic Data Science
Collaboration Graph
Number of tasks in common
= edge strength
22. TECHNICAL UNIVERITY OF MUNICH | USC INFORMATION SCIENCES INSTITUTE Felix Michel 22
ApproachIntroductionMotivation Organic Data Science Wiki Evaluation Conclusion
Conclusions
Conclusion
The Organic Data Science Framework provides
a task-centered organization
incorporates social design principles
open exposure of scientific processes
Future work: Analyzing the evolution of
the communities in quantitative terms.
23. TECHNICAL UNIVERITY OF MUNICH | USC INFORMATION SCIENCES INSTITUTE Felix Michel 23
ApproachIntroductionMotivation Organic Data Science Wiki Evaluation Conclusion
Thank You
https://github.com/IKCAP/organicdatascience
Organic Data Science Framework
http://www.organicdatascience.org/
Development
Acknowledgments
We gratefully acknowledge funding from the US National Science Foundation under grant IIS-1344272.
24. TECHNICAL UNIVERITY OF MUNICH | USC INFORMATION SCIENCES INSTITUTE Felix Michel 24
ApproachIntroductionMotivation Organic Data Science Wiki Evaluation Conclusion
Task Management8
Organic Data Science Wiki
Task Management8
A3: Organize content, people, and
activities into subspaces once
there is enough activity
F3: Quality inspection with visibility
into intermediate steps
[A3, B3, B10, F3]
I. Self-Organization
Hinweis der Redaktion
Galileo, Newton, Darwin and Einstein published fundamental work
single-authorship
Watson and Crick made progress on unscrambling the DNA’s structure
co-authorship
International Human Genome Sequencing Consortiumlarge number co-authors
ATLAS Detector Project at the Large Hadron Collider in CERNthe community as author
R1: Significant organization and coordination
R1.1: Support ad-hoc processes
R1.2: Show responsibilities and commitments
R1.3: Transparent workflow progress
R1.4: Support hierarchical structures of tasks
R2: Maintaining a community over the longer term
R2.1: Highlighting the importance of individuals expertise
R2.2: Supporting the creation of new communities of practice
R2.3: Understandable and adaptable for needs of specific groups
R3: Growing the community based on unanticipated needs
R3.1: Incorporate new participants as their expertise become relevant to the problem
R3.2: Formulating new tasks as the groups make progress and understand what is needed
R3.3: Creating mechanisms to learn about the work of others
The Web was originally developed to support collaboration in science. Although scientists benefit from many forms of collaboration on the Web (e.g., blogs, wikis, forums, code sharing, etc.), most collaborative projects are coordinated over email, phone calls, and in-person meetings. Our goal is to develop a collaborative infrastructure for scientists to work on complex science questions that require multi-disciplinary contributions to gather and analyze data, that cannot occur without significant coordination to synthesize findings, and that grow organically to accommodate new contributors as needed as the work evolves over time. Our approach is to develop an organic data science framework based on a task-centered organization of the collaboration, includes principles from social sciences for successful on-line communities, and exposes an open science process. Our approach is implemented as an extension of a semantic wiki platform, and captures formal representations of task decomposition structures, relations between tasks and users, and other properties of tasks, data, and other relevant science objects. All these entities are captured through the semantic wiki user interface, represented as semantic web objects, and exported as linked data.
, through an interface that supports scientists to organize joint tasks and to easily track where they can contribute and when
through an interface that incorporates principles from social sciences research on successful on-line collaborations, including best practices for retention and growth of the community
through an interface that captures new kinds of metadata about the collaboration so all participants (especially newcomers) can immediately catch up with the work being done
A1: Carve a niche of interest, scoped in terms of topics, members, activities, and purpose [w]
A2: Relate to competing sites, integrate content
A3: Organize content, people, and activities into subspaces once there is enough activity
B7: Stress benefits of contribution,
D1: Members recruiting colleagues is most effective
D5: When small, acknowledge each new member
D6: Advertise members particularly community leaders, include pictures [w]
D7: Provide concrete incentives to early members [w]
E2: Appoint a volunteer to summarize periodicallyE6: Keep few tasks active at any given time, helps focus
F1: Spine of leadership, including a few leading scientists and 1-2 operational project managers, that resolves complex scientific and social problems and has transparent decision making
F2: Written and publicly accessible rules to transfer work between groups, to assign credit when papers are published, to present the work [w]
F4: Export of data and results, integration with existing standards
A3: Organize content, people, and activities into subspaces once there is enough activity
A4: Highlight more active tasks
A6: Create mechanisms to match people to activities
B1: Make it easy to see and track needed contributions [w]
B3: Simple tasks with challenging goals are easier to comply with [w]
B10: People are more willing to contribute: 1) when group is small, 2) when committed to the group, 3) when their contributions are unique
C2: Give subgroups a name and a tagline
C3: Put subgroups in the context of a larger group [w]
C4: Make community goals and purpose explicit
C5: Interdependent tasks increase commitment a. reduce conflict
E1: Permanent URLs for posts and comments, so others can refer to them [w]
F3: Quality inspection w. visibility into intermediate steps
A4: Highlight more active tasks
A5: Inactive tasks should have "expected active times“ [w]
A6: Create mechanisms to match people to activities [w]
B1: Make it easy to see and track needed contributions [w]
B2: Ask specific people on tasks of interest to them
B4: Specify deadlines for tasks, while leaving people in control [w]
B5: Give frequent feedback specific to the goals [w]
B6: Requests coming from leaders lead to more contributions
C1: Cluster members to help them identify with the community
C2: Give subgroups a name and a tagline
E5: Advance notice of tasks that are anticipated
F3: Quality inspection w. visibility into intermediate steps
high-level - such as the major tasks at the project level.
Medium-level - such as activities within the project that are decomposed into several subtasks.
Low-level - small well-defined tasks that can be accomplished in a short time period.
B1: Make it easy to see and track needed contributions [w]
B4: Specify deadlines for tasks, while leaving people in control
B10: People are more willing to contribute: 1) when group is small, 2) when committed to the group, 3) when their contributions are unique
C1: Cluster members to help them identify with the community
C2: Give subgroups a name and a tagline
C3: Put subgroups in the context of a larger group [w]
C4: Make community goals and purpose explicit
C5: Interdependent tasks increase commitment a. reduce conflict
F3: Quality inspection w. visibility into intermediate steps [w]
A5: Inactive tasks should have "expected active times“ [w]
B1: Make it easy to see and track needed contributions
B5: Give frequent feedback specific to the goals [w]
E5: Advance notice of tasks that are anticipated [w]
F3: Quality inspection with. visibility into intermediate steps
B1: Make it easy to see and track needed contributions
B4: Specify deadlines for tasks, while leaving people in control
B1: Make it easy to see and track needed contributions
B2: Ask specific people on tasks of interest to them [w]
B5: Give frequent feedback specific to the goals
B8: Give (small, intangible) rewards tied to performance (not just for signing up) [w]
B10: People are more willing to contribute: 1) when group is small, 2) when committed to the group, 3) when their contributions are unique
C1: Cluster members to help them identify with the community [w]
C5: Interdependent tasks increase commitment a. reduce conflict [w]
B1: Make it easy to see and track needed contributions
B5: Give frequent feedback specific to the goals
E5: Advance notice of tasks that are anticipated
F3: Quality inspection w. visibility into intermediate steps
D2: Appoint people responsible for immediate friendly interactions
D3: Introducing newcomers to members increases interactions [w]
D4: Entry barriers for newcomers help screen for commitment
D8: Design common learning experiences for newcomers [w]
D9: Design clear sequence of stages to newcomers [w]
D10: Newcomers go through experiences to learn community rules
D11: Provide sandboxes for newcomers while they are learning
D12: Progressive access controls reduce harm while learning
E3: Appoint a volunteer to answer questions from newcomers
E4: Low barrier of entry: make it VERY easy to comment
We have specific hypotheses about how the maturity of the project will affect the management of tasks, about how the growth of the communities will affect the amount of on-line coordination that occurs, and about the task structure as the scope of the work increases.
A3: Organize content, people, and activities into subspaces once there is enough activity
B3: Simple tasks with challenging goals are easier to comply with
B10: People are more willing to contribute: 1) when group is small, 2) when committed to the group, 3) when their contributions are unique
F3: Quality inspection w. visibility into intermediate steps