IAC 2024 - IA Fast Track to Search Focused AI Solutions
Â
If We Build It, Will They Come
1. If we build it
will they come?
Prof Carole Goble FREng FBCS CITP
carole.goble@manchester.ac.uk
BOSC, Long Beach, July 14 2012
http://www.mygrid.org.uk
2. Est. 2001
Improving Knowledge Turning,
Enabling Reuse and Reproducibility
[Josh Sommer]
Keep the vision, modify the plan
3. Computational Methods LGPL
Scientific workflows.
Distributed web/grid/cloud services
Third party, independent service reuse
Data pipelines and analytics
Volunteerist Human Computation BSD
e-Laboratories - social collaboration
and sharing environments for scientific
artefacts. Libraries and Catalogues.
Asset safe havens, sharing, reuse.
Knowledge Acquisition Tools
Various
Semantic technology, semantic
applications, research objects,
executable papers.
OWL
Data/Metadata curation & reuse
POPULOUS SKOSEdit
4. The Taverna Suite of Tools Web Portals
Workflow Repository GUI Workbench Client User Interfaces
Virtual
Machine
Service Catalogue Third Party Tools
Workflow Engine
Provenance Workflow
Store Command Line
Server
Activity and Service
Plug-in Manager
Open
Provenance
Model
Programming and
Secure Service Access APIs
5. Community Haven
Sharing Resource
Social Collaboration
http://www.myexperiment.org
5820 members, 304
groups, 2415 workflows,
604 files and 229 packs
(research objects)
http://wiki.myexperiment.org/index.php/Galaxy
6. BioCatalogue:
crowd curation of web services
Contribute, Find and
understand Web
Services
Curate, review and
comment
Learning resource
Monitor Services Cloud Registry
2295 REST and SOAP services, 169 service
providers. 674 members, 27 countries
7. Find experts,
colleagues and
peers.
Find, exchange
and interlink,
preserve, publish
data, models,
publications,
SOPs & analyses.
ISA Compliant
SysMO: 16 consortia, 110 institutes,
1600+ assets, 350+ members
Launch and validate Gateway to GerontoSys
models and analyses: public tools and
JWS Online resources, e.g.
BioModels livSYSiPS
9. Standards & Content Sharing Platform
Governance & Policy & Trusted Service
Software
& Tools
Open source
Gateway
Comp Sci
Research
Platform
Knowledge Network Preservation &
Skills & Community Building Publication Platforms
10. Laissez-faire Philosophy
• Bottom Up
– Emergent & scruffy (to a degree…)
• Reliant on third party contributions
– Non-prescriptive, non-interfering and
flexible
– We make no content ourselves….
• Part of a wider ecosystem
– Other services, data, tools, platforms,
people…
• Inspired by social environments
• Scarred by top-down, dictated,
tech-driven and unused monoliths
11. http://www.flickr.com/photos/hellaoakland/3137360455/
Never underestimate Liberty through
how scruffy third Limitations
party stuff can be
How often metadata is People say they want
missing and messy if flexibility. They prefer the
left to its own simplicity of order and will
devices…
adapt to adopt.
12. Who is they?
• Jobbing
Bioinformatician?
• Expert
Bioinformatician?
• Sys admin?
• Service provider?
• Application
developer?
• Tool developer?
• Biologist?
13. Who is THEY?
Drug Toxicity Pharmacogenomics Trypanosomiasis in The Virtual
(OpenTox Project) GWAS African Cattle Liver
Physiopathology of Genetic differences
Systems Biology of the human body between breeds of
Metagenomics cattle
Micro-Organisms Medical Imaging
14. Consortia
Organised,
Planned, Strong
connections with
resource Independents….
Bovine
providers and
Trypanosomiasis
each other. Consortium
Research
Distributed Groups & Groups
Independent Lone
rangers
Long tail, Disconnected
from data providers and
each other, emergent,
Individuals
15. Specialise or
Diversify?
• Flexibility and extensibility ->
customised Software and
Document
Services, Cookie cutter Helio-
Preservation
Physics
• Widen adoption
• Spread risk, extend
resourcing streams
BioDiversity Astronomy
• Cross development
alignment and coordination
• More communities to build,
nurture, support and sustain
• Core Drift and Bashing
Social Science Engineering: JPL, NASA
FLOSS
16. BioDiversity Virtual e-Laboratory
http://www.biovel.eu
Biodiversity Services Catalogues / Execution
Repositories environment
Provenance
Phylogenetic
BLAST,Hmmer, WebDaV Data
MrBayes, Management
Blast, PAML,
Taverna
EMBOSS,… Workbench
Search
Open
Taxonomic
Synonyms
Visualisation
Authentication /
Authorisation
BioSTIF Taverna
Workflow Engine
Google Refine CSW and Server
Modelling/GeoProcessing
Grid, Cloud, etc.
R
openModeller
Platforms
WPS / WCPS
17. Who is We? The ego-system
biologists,
bioinformaticians,
biodiversity
informaticians,
astro-informaticians,
social scientists
modellers, software
engineers,
computer scientists,
systems administrators,
resource providers
20. Applications
Production
Publishing Training
Research
Community Community
21. So if we build it will they come?
Be useful for something: immediately,
continuously, responsively
Be usable by somebody: user experience,
worth the effort, adoption path
Some of the time: as part of a big picture
Under promise and over deliver
Acquire Critical Mass
22. Four things that drive adoption
of software or service.
1. Added value
– Do something that couldn’t do before or now do faster,
gain competitive advantage, improve productivity,
scale up
2. New asset
– Get or retain access to something important (data,
method, technique, skills, knowledge)
3. Keep up with the field. A Community.
– Future-proof my practice, New skills and capacity,
there is a vibe about it and I’ll be left out
4. Because there is no choice
– Business depends on it, its mandated, its de facto
mandated
23. Seven things that hinder
adoption of software or service
1. Not enough added value
• It doesn’t solve a problem or not as well or as cheaply
as something else, no content or the right content
It Sucks
2. Not fit for take-on. It doesn’t work!
• No: help, guides, documentation, manuals, examples,
content, templates, portability, migration / legacy
support, easy installation, virtual machines, testing,
stability, version control, release cycle, roadmap,
sustainability prospect, way of introducing my
favourite component/data/environment.
3. No Time or Capacity to take on
• To learn, migrate personal legacy
code/data/applications, no pathway/ramp to adoption
• Training and special system needs
24. Software practices
Zeeya Merali , Nature 467, 775-777 (2010) | doi:10.1038/467775a
Computational science: ...Error…why scientific programming does not compute.
“As a general rule,
researchers do not
test or document
their programs
rigorously, and they
rarely release their
codes, making it
almost impossible
to reproduce and
verify published
results generated
by scientific
software”
25. Software Stewardship
“Better Science through Superior Software” – C Titus Brown
Software sustainability
Software practices
Software deposition
Long term access to software
Credit for software
Licensing advice
Open licenses
Reproducible Research Standard, Victoria Stodden,
Intl J Comm Law & Policy, 13 2009
26. Seven things that hinder
adoption of software or service
1. Cost
– Of disruption, of long-term ownership
–
It’s too costly
2. Exposure to Risk.
First to take-up, Support and sustainability dependencies,
fear of scrutiny, misrepresentation or being scooped,
3. No Community
– Support and comfort
4. Changes to work practices
– Obligations, unclear or unenforced reciprocity protocols.
27. • It sucks but it’s the
only thing around
• It’s ace but it’s one
of many, too late in
the game and not
enough to switch
• Tipping point is
likely not technical
Betamax vs VHS
28. Bonus Hinder
Never heard of it.
We’ve built it but we haven’t told anyone.
• Make noise…physically and virtually
• Customer and Contributor Relationship Building
• Self-supporting communities, multi-level marketing
• Highly Resource Intensive
29. Bonus Hinder
Never heard of it.
We’ve built it but we haven’t told anyone.
Market
User Community
Development
It all kicks off
Developer Community
30. Adoption Intentions
Be careful what you wish for
• Incidental
– “I built it for myself, and stuck it out there”
• Familial
– “I built it for people just like me”
• Fundamental
– “I built it for others, many who are not like me”
31. Open Innovation: Development and Content
you are not alone. you can’t do it all alone
motivate & enable others to fill gaps “App Store Style”
software, services, content, examples….
• Really Interoperate. Don’t tweak.
• Be Simple and Standard.
• Be Helpful. Be Set up. Be
reusable. Be Smart Friends
Galaxy+Taverna/myExperiment
Family
• Others will develop on top of you.
But don’t assume they will re-
contribute or tell you.
Acquaintances
• It’s much harder than you think.
Strangers
• It’s unequal.
32. Ladder Model of OSS Adoption
(adapted from Carbone P., Value Derived
from Open Source is a Function of
Family Acquaintances
Friends Maturity Levels) Strangers
Moore's technology adoption curve
[FLOSS@Sycracuse]
33. "it's better, initially, to make a small
number of users really love you than a
large number kind of like you"
Paul Buchheit
paulbuchheit.blogspot.com
34. PALS: Building Friendships
Intelligence, Guidance, Advocacy, Evangelism, Market Research
What’s in it for the PAL?
– Long tail: Money, kudos,
special support, special
resources, skills, reputation
building, influence, stuff they
can’t do alone, CV building
– Consortia: co-funded
• Who is a PAL?
– Post-docs, Post-grads,
Administrators, Developers
– PI: protector/champion
• PAL handlers
– Customer Relationship
Manager, Nanny and
Mediator, Scientist
35. Do not under-estimate…
The power of the sprint / The power of a whizzy
*-athon / fest / drinking interface. Even for plumbing.
The importance of
supporting and propagating
best practice
37. Participatory Design
Work Together on a Real Problem
Funders Project PIs PALs
Data sharing Data control Spreadsheets.
Data standards Own databases Yellow Pages.
Just enough SOPs
A database
exchange. Understanding
Long term Visibility limitations standards
preservation
Project dependence Curating.
Examples.
3 Years later 15/16 consortia Safe Haven
abandoned their own systems and Project
went with the SEEK system. independence
39. Participation Cooperation? Coordination? Collaboration?
Citizens Integration? Evolution and entropy models
Public
scientists
Trusted
Collaborators
Private
Groups
Lone
scholars
Closed Controlled Open
[based on an idea by Liz Lyon] Access
40. Critical mass spiral: 90:9:1
Driven by needs of
and benefits to the
scientist, rather
than top down
policies.
Content tipping
point
[Andrew Su]
41. Trust, Fame and Blame: Reciprocity,
Competition, Contribution and Use
• Scooping, Scrutiny and Misinterpretation
• Curation Cost
• Poor quality
• Reputation / Asset Economics
• Public Peer Pressure
Reciprocity Sucks
• Flirting
• Hugging
• Controlled Sharing
• Voyerism
• Poor feedback / credit Nature 461, 145 (10 September 2009)
Victoria Stodden, The Scientific Method in Practice: Reproducibility in the Computational Sciences Feb 9,
2010 MIT Sloan Research Paper No. 4773-10, http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1550193
42. Harness Competitiveness Carrots
Pride
• Reputation: Cult, Credit & Attribution for all
Protection
• Just enough Sharing, Licensing & Liability
• Quality, Peer review, Metadata
Preservation
• Safe havens and Sunsets (project churn)
Publishing / Release
• Citability, Supporting Exchange
Productivity
• Availability of assets, help, capability,
ramps
45. Adoption Stealth
• Data at home promise with
automated harvesting
• Sharing creep, Incremental
metadata, Low obligations
• URL upload in BioCatalogue
• Web Service “come as you
are” take-on in Taverna
• Metadata prompting, Right
tools, right time, right place
• Service collections &
Packaged services
46. Be vigilant
• PAL burn-out and
over familiarity
• Unadjusted over-
user accommodation
• Drifting apart and not
keeping it fresh
• Step back, observe
and adapt/intervene!
• So relieved to get a
community….
• Instrument adoption
and observation
Participatory Development is a mutual long term relationship
Not flirty speed dating, One night stand, Crush, Me Me Me
47. Urgent-Important
• Technical bog down,
operational burn-out
• Little things that are
important but don’t
seem that urgent…
• Dominant projects
• Not-software content
• It all takes way longer
than you think
• Simplicity drift
Participatory Development is a mutual long term relationship
Not flirty speed dating, One night stand, Crush, Me Me Me
49. The Jam-based
Adoption Model
aka
Added Value
Value Proposition
Return On Investment
http://delicious-cooks.com/photos/raspberry-jam/04/
50. What’s is the Special Jam?
What is your Jam Value Chain and for Who?
What:
SysMO: safe haven, spreadsheet tooling, linking
SOPs, models and data, examples
Taverna: power, adaptability and myExperiment
Who:
Focused on contributors and experts
Provider-consumer balance
Functionality-Simplicity Syndrome
Changing Who - Challenging baked-ins
51. Jam today and more, better Jam tomorrow
Just Enough Jam, Just in Time not Just in Case
* Feature Creep Conundrum * Big Picture Paradox
* Core vs Specifics Syndrome * Content Decay Dilemma
* Working to working Stability Stress
52. Customised Specific Jam beats Generic
* Flexibility/Functionality – Simplicity Conundrum
* Diversification Dilemma
53. http://www.gettyimages.co.uk/detail/photo/empty-jam-jar-royalty-free-image/136976198
Where is my Jam? Jam for All
• What are WE (platform providers,
Software builders, Community
builders and Service providers)
getting out if it?
• Need credit and interest too.
• Altmetrics
Howison and Herbsleb, Scientific Software Production:
Incentives and Collaboration, CSCW 2011, March 19–23,
2011, Hangzhou, China
http://james.howison.name/pubs/HowisonHerbsleb2011SciSoftIncentives.pdf
54. Jam forever
They came. Have the evidence. Have a plan.
Did you wish for this? Do you want it?
Fragile Flux
• Content, services, bits, communities
Funding Plan
• Novelty over sustainability,
• Research-Production Falsehoods
• Wave invention, Political lobbying
Securing the community
• Leadership & Foundations
Business model???
Software is Free like Puppies Are Free
55. Jam not forever
• Acquire
• Retain
• Widen
– More/Different
• Reposition
– Different/New Stage
• Changing Community
is Challenging… [Daron Green]
56. Adoption is a The Social and the
Merry-Go-Round Technical
are Inseparable
57. You know they came when…
…you were useful and usable to someone some of the time,
but they might not tell you
… people ask you to join their consortia or use it
… they gave up their own home grown stuff for yours
… someone you don’t know uses it and tells you all about
your own stuff.
… someone publishes papers about it. Without citing you.
… someone else claims credit.
… people you don’t know start bitching about it.
… its just expected to be there and you are kind of expected
to be there too.
…your Head of School complains you don’t do enough CS
research because you are doing too much Software
Engineering and Support.
58. James Howison Heather Piwowar
Victoria Stodden Janet Vertesi
Christine Borgman Nosh Contractor
Acknowledgements (1)
Jay Liebowitz Robert Kraut
59. Acknowledgements (2)
• The myGrid family, friends and contributors
• But especially: Katy Wolstencroft, David Withers, Marco
Roos, Alan Williams, Jits Bhagat, Stuart Owen, Stian
Soiland-Reyes, Shoab Sufi, Robert Stevens, Paul Fisher,
Peter Li, Ian Dunlop, Finn Bacall, Mannie Tags, Niall
Beard, Rob Haines, Christian Brenninkmeijer, Alasdair
Gray, Tim Clark, Pinar Alper, Paolo Missier, Khalid
Belhajjame, Duncan Hull, Sean Bechhofer, david De
Roure, Don Cruickshank, Wolfgang Mueller, Olga Krebs,
Franco Du Preez, Quyen Nguyen, Jacky Snoep.
• The members of Wf4ever, SysMO, BioVel, HELIO,
SCAPE, OMII, SSI, NeiSS, Obesity e-Lab and anyone
else I forgot
61. Coalface Patrons
users
Skeptic
Champions Keep your
Friends Close Friends and Family
Fit in
Favours will
Embed
Favour you Jam Today
Jam Tomorrow Act Local
Think Global
End Users
Developers Just Enough Design for
Know Anticipate
Just in Time Network Effects
Service your Change
Providers Users
Enable Users
System to Add Value
Administrators
Keep Sight of the
Bigger Picture
SUMMARY
(De Roure and Goble, IEEE Software 2009)
Hinweis der Redaktion
If I build it will they come? : What is it we are building? What is it we are building ? Who is they? Who are we? Over the years I have built a bunch of open source software and services for researchers: the Taverna workflow system, myExperiment for workflow sharing, BioCatalogue for services, SEEK for Systems Biology data and models, and most recently MethodBox for longitudinal data sets. As well as building software we built communities: development communities and user communities. So what drives/hinders adoption? What do I know now that I wished I had known before? How do we sustain communities on time-limited grants? How do we build it so they come, stay and join in?
Distributed Groups Independents and Partners Organised Teams, Planned, Strong connections with resource providers and each other. Structured, Cross-partner sharing, Retained results Distributed Groups & Independent Lone rangers Long tail, Disconnected from data providers and each other, emergent, fluid, personal stores, small science from big Make workflows for group Run workflows from platforms Store and Find Workflows Catalogue and Find Services Catalogue, store and find data, SOPs, Models Link stuff Release & Share stuff Curate stuff Cooperate / Collaborate / Coordinate / CoShape Vary on Coordination, collaboration, cooperation, contribution, integration, sustainability, longevity
Make workflows for group Run workflows from platforms Store and Find Workflows Catalogue and Find Services Catalogue, store and find data, SOPs, Models Link stuff Release & Share stuff Curate stuff Cooperate / Collaborate / Coordinate / CoShape
Still some people missing!
Knowledge Transfer Three tracks Large Team.
Developer and user adoption Contributed collaborative content Collaborative development
Maybe you don’t care…. Content and Promotion matter more than software, but harder to fund and different people to software developers.
Incidental – not really building for adoption or others to take up Familial – the producer and the consumer are the same – many are like this in BOSC
CLAs for set up. Remember upgrade paths Cooperate, Network effects, Amplify Self-supporting, Multi-level marketing There are no green fields.
Please some of the people some of the time
They all start off like this…
Working the first time User experience over smart. Cool interfaces (even for plumbing)
Primary Community Review Facebook generation! Community participation Sharing Commons based production Social Curation Voluntary contribution 1. Primary Content 2. Curation duties GeneWiki, Rfam, myExperiment, PloS, UsefulChem, OpenWetWare Open Science vs Long Tail Social networks vs the Long Tail Incentives and Obstacles Myths and Miracles Contribution. Curation. Volunteer science
Limited focus Social networking around content . Feedback loops.
PAL recruitment Content contribution Stick: Community, Journal and funder mandates – there is no stick Credit for peer review
Don’t forget to make more demands though!
User burn-out and over familiarity Over-friendly Stockhausen syndrome, absence of friendly fire, Keep enemies even closer Unadjusted over-user accommodation Fit in at first, get buy-in, move in, move on Drifting apart and not keeping it fresh Keep jointly working on real, concrete cases Don’t assume they will stay: Users are fickle. Step back, observe and adapt/intervene! So relieved get a community forget to see what they do (e.g. dubious workflow designs) Much easier with e-Laboratory Services that are inherently social collaboration spaces. Complacency Esp. dangerous outside funded collaborations Measuring impact and getting feedback Downloads ≠useful (or usable) Don’t be prescriptive. Scientists control. – but actually we need to be a bit prescriptive Danger! Going native. Missing users. Fossilisation and complacency User experience over smart. Cool interfaces (even for plumbing *-athons Embedded co-working The total problem Replying Eating your own dog food Examples! Working the first time
Version 2 Syndrome Being too clever, forgetting about engagement Technical bog down and operational burn-out Fire fighting, Heads down not eyes up Little simple things that are important but don’t seem that urgent… But are the ha’peth of tar that sinks the ship Major project dominance He who pays the piper calls the tune Non-software innovations Seek and contribute content/component and contributing partners
Activation Energy Argument Balance against feature creep short-termism Keep planning the big stuff… Balance the cost to the benefit. But hacks survive – and don’t do the strategy.
58% by students, 24% unmaintained Schultheiss et al. (2010) PLoS Comp Bio Content and Promotion matter more than software, but harder to fund and different people to software developers. What’s your plan? Maintaining content, software, services Different groups, evolving practices, changing times, new patterns….. Funding cycles, chasms and reinventions Reward not hinder adoption. Foundations, Friends and Business Models…and the Open Source Community Silver Bullet!
Hard to Plan….
When the program’s Data Management Group chair claims it’s the only data system they have used that works. To your funders. Whoo-hoo!
Computer Supported Cooperative Work, Team Science, Knowledge Management, Social Science, Information Science, Library Science, Digital Scholarship, Collaboratories…