13. 400
Number of human pathways
Number of unique human genes
320
240 6,200
160 4,700
80 3,200
http://www.wikipathways.org
14. 2,800
Over 1 million pageviews
by 280,000 unique visitors
1,400
~22%
0
http://www.wikipathways.org
15. Set Early Milestones
• Online (Mar ‘07) Success!
• Firstunknown user (Jan (Jan ’08)
• First unknown user ’08)
16. Don’t Try to Change the World
Work with (not against) established:
• Models
• First unknown user (Jan ’08)
• Communities
• Tools and pipelines
• Publishing models
17. Go Ahead, Change the World
• Tweak established models
••First unknown user (Jan ’08)
Grow communities
• Change perspectives
• everyone is a curator
• knowledge should be open
18. Go Ahead, Change the World
• Tweak established models
• Grow communities
• Change perspectives
• New attribution systems
• redefine “publication”
• redefine “productive”
19. Go Ahead, Change the World
• Tweak established models
• Grow communities
• Change perspectives
• New attribution systems
• New analysis pipelines
• connect with other community-
curated resources
21. Thomas Kelder
Martijn van Iersel
Kristina Hanspers
Martina Kutmon
Andra Waagmeester
Chris Evelo
Bruce Conklin nrnb.org
wikipathways.org
Acknowledgements
Editor's Notes
Today I’m going to talk about WikiPathways and how to change the world. Well, not really the whole world; more like a very small corner of it. Earth is about 12k km in diameter, while the biological pathways that one would change at WikiPathways are more on the order of tens of microns. So, I mean, *very* small corner.
But like the other talks in this session, this talk is also about community intelligence and changing the way people think about databases and how individual scientists can communicate and share knowledge… the typical scientist is a couple meters tall, so that adds up to a slightly larger “small corner of the world”. [I appologize for the depiction of a scientist, but if your google “scientist”, this is one of the top hits. We clearly have more work to do on our public image]
Here, for example, is a typical pathway from WikiPathways. Like most textbook pathways, it depicts proteins and metabolites, reactions and complexes, and their localization into subcellular compartments. But each one of these rectangles is also a data object connected to a database of standard identifiers that can be mapped to a variety of databsets.
Here, for example, we’re seeing differential expression data with up- and down-regulated genes in yellow and blue. When a biologist looks at this, something very special happens. A little movie is triggered in their mind……the room goes quiet and their focus is drawn to a particular area of interest; they think about kinetics, rate-limiting factors, conditions and timing; they consider a number of "what if" scenarios: "what if this cascade of events……could be blocked by increasing this factor". This is in fact exactly what happens when we take statin durgs to lower our cholesterol. What I’m trying to illustrate here with ppt tricks is the act of visualization…
The data-mapped image is really just sitting there and all of this is just going on in the mind of the researcher, right? …the researcher takes in this visual data, which allows it to mix with all the other associations up there from prior observations they’ve made (the majority of which have not been published or put into textbooks) and from conversations they’ve had with colleagues (at conferences like this).This wouldn’t be necessary if we could parameterize all these subtle associations……and model them all in a supercomputer. Then we could just mix in all known interactions, molecular concentrations and kinetic rates, and just read out the answers to our questions in concrete units of information. But alas, we can’t do this (at least not yet),… …so even though this is conference on intelligent systems, I’d argue that there are a number of situations where humans are actually still really important in data analysis.
Returning now to this basic unit of pathway visualization, the pathway diagram. How exactly are these models constructed? They do not come from direct measurement. It was assembled from a wide variety of data types and assays, a curated set of observations left intentionally sparse. This pathway, for example, is showing the mechanism of a common cancer drug called 5-FU: it's metabolized in the liver, it's byproducts enter the blood stream and are taken up by cancer cells where they disrupt key pathways for cell survival. But this pathway is not representing all that we know about this process and new data about these components and their interactions continues to pour in with no end in sight. So, all we know for sure is that this model will change over time as we fill-in details and learn what’s most relevant.
This is exactly what we had in mind when we started the WikiPathways project. It's a wiki, like wikipedia, but what we did is we ripped out the text editor and replace it with our own pathway drawing tool. So anyone can find a pathway, click 'edit', and then add new information [[like a new byproduct of 5-FU that also goes to cancer cells and triggers apoptosis]]. You then click ‘save’ and your changes are immediately available to the rest of the world. You can provide literature references to cite evidence for your changes. And the entire research community is your peer review group: they can approve or undo your changes. In this way, we can keep up with the flood of new data relating to biological processes.
When you’re editing a pathway, you are not only editing the diagram, you’re also editing a standard XML file with BioPAX elements that can be exchanged and accessed programatically. For, the software developers in the audience, in addition to this XML formats, there is also web service access to the pathway content. You can programmtically return pathway images with highlighted nodes, for example. There is embed code to insert interactive pathway widgets into your own web sites, and we are starting to represent our pathway content as linked data to support semantic queries. The most common workflow today is import the XML into tools like PathVisio and Cytoscape.
After loading a pathway into Cytoscape, for example, you can then import your own dataset from an excel spreadsheet and define how you want your data to map in terms of color gradients. This process is dynamic and interactive, so you can explore your dataset in the context of these pathways. And, of course, in Cytoscape you can make use of all the other apps that are available to calculate shortest path, perform clustering or over-representation analysis.
Putting it all together, you can begin to see how we are feeding into this virtuous cycle. Data is synthesized into pathway diagrams. And orthogonal data can be mapped onto these pathway models. Computational analysis, together with the act of visualization can lead to new explanations and new ideas.And finally, these new ideas can be tested to generate new data, bringing us back to synthesis. And I have to say, the wiki model really working well here…
We’ve been collecting and curating pathways since 2001. In the years just prior to launching WikiPathways we really struggled relying on our internal curation team alone. [This was our growth curve for number of pathways in blue and number of unique genes on those pathways in green]. In the years following the launch of WikiPathways we experienced a whole new level of growth. And this last year things are really starting to take off. I might have to start using logarithmic plots for the number of pathways. It’s difficult to quantity the effect on quality, but our curation team has been thrilled by the quality and overall improvements we’re seeing in the content. Basically, no internal team can curate all of biology; this task can only be done by a distributed system.
And in terms of participation, not only has the number users increased since our launch in 2008, but so has the number of contributors, averaging at around 22%. Putting pathway editing and curation tools into the hands of researchers is the best (and only) way to keep up with the flood of new data coming in; and they are actually using them! You only need to register if you want to edit, so we also have lots of folks viewing and downloading pathways: over1 million pageviews by over a quarter million unique visitors. And these numbers don’t include access through Cytoscape, embed code and web services. I know these numbers don’t compare to wikipedia, but come on, we’re talking about biological pathways here: a niche market within the niche market of systems biology.
I’ll wrap up now with a specific example of a new curation workflow we are working on for WikiPathways. Imagine that while you are editing a pathway, you’ve just added this carboxylase and this panel updates with interactions and annotations specific to that protein. Here, for example, are paths 1, 2 and 3 steps away that you could drag into the pathway. This would also automatically bring in references and evidence for the interactions from another resource.Likewise, any novel interactions drawn and cited in WikiPathways could be shared back to that resource and passed on to all the other tools that make use of it.This type of interoperability will be easy with web-based, semantic data collections, where the formats and content are openly defined and shared.
I'd like to acknowledge the teams of developers I work with on WikiPathways. I'm also affiliated with NRNB, the National Resource for Network Biology. Part of our mission is to promote the development and use of network biology tools and resources. If you are interested in developing WikiPathways or Cytoscape, for example, let us know. One way we coordinate this is through the annual Google Summer of Code program, where Google pays students from around the world to write open source code for our projects. You can find out more at nrnb.org. Thank you.