Using KnetMiner to search and visualise the knowledge network of genes involved in neurodegenerative diseases such as Alzheimer, Parkinson and Huntington.
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
KnetMiner - EBI Workshop 2017
1. Mining biological knowledge networks for
gene-phenotype discovery
Keywan Hassani-Pak
EBI course: Introduction to Omics data integration
March 2017
http://knetminer.rothamsted.ac.uk/
@KnetMiner
2. • Rothamsted is the longest running
agricultural research station in the world,
providing cutting-edge science and
innovation for more than 170 years.
• Over 450 staff
• Bioinformatics group
• Bioinformatics Analysts
• Software Developers
About us…
3. Agenda for today
Kevin Dialdestoro
Stephanie Brunet
Part I Part II
Keywan Hassani-Pak
Ajit Singh
Monika Mistry
4. • To understand why linking genotype to phenotype is complex
• To learn which information types are useful for candidate gene prioritization
• To understand the concept of knowledge networks/graphs
• To use KnetMiner for the interpretation of your RNA-seq, QTL, GWAS results
• To learn a little bit about neurodegenerative diseases
Learning Objectives for Part I
5. The Genotype to Phenotype Challenge
Genotype
QTL and GWAS
Omics
Includes any ‘omics
Phenotype
Disease
Intelligence
Flowering
Stress tolerance
Biological Knowledge Discovery
Data selection, processing, transformation, integration and
interpretation
6. The approach is generic and works similarly for other species
7. • Free and open source
• Data warehousing using a graph-
database
• Platform to integrate public and private
datasets in various formats
• Provides a GUI, CLI, APIs and workflows
for reproducible data integration
Ondex – Data Integration Platform
Ondex
www.ondex.org Not covered in this training course!
8. Let’s start with some GWAS data…
http://plants.ensembl.org/biomart
Example Arabidopsis
#SNP=66,816 | #Gene=27,502 | #Phenotype=107
12. • Gene-GO
• Gene-Phenotype
Gene knock-out or overexpression
Text mining publications
• Gene-Publication
• Gene-Pathway
• Gene-Expression
• Protein-Small Molecule
• Homology to other species
… add other open linked data
>800,000 nodes
>3,000,000 edges
Genome-scale knowledge network
13. • Progressive loss of structure or function of
neurons, including death of neurons
• Many types including Alzheimer, Parkinson,
Huntington…
• Many similarities between these diseases on
a sub-cellular level
• Discovering these similarities offers hope for
therapeutic advances that could ameliorate
many diseases simultaneously
Neurodegenerative Diseases
14. • Use OMIM advanced search
Query: alzheimer parkinson huntington
Tick: Search in Title
Tick: MIM Number Prefix: “# phenotype”
• Download results as Tab-delimited file
• Copy MIM ids without the prefix “#”
• Use UniProt Retrieve/ID mapping
Provide your MIM identifiers
Select option: From MIM to UniProtKB
Press Go and download all proteins in
XML format (compressed)
Tutorial data – based on 33 human genes
15. Integration of public datasets
Public Databases
Quantitative data
Interaction data
Omics Data
Datasets and workflows: https://github.com/Rothamsted/ondex-knet-builder
17. • Methods needed to evaluate millions of
relationships in knowledge network, prioritize
genes and extract relevant subnetworks
• Interactive and exploratory tools needed to
enable knowledge discovery and decision
making
• Interpretation should be the task of domain
experts i.e. biologists!
How to search and interpret too much information?
18. Web Browser
Server
Servlets and JSP Page
Java Socket
Knowledge
Graph DBOndex API
DHTML
JavaScript
Apache Tomcat
Multithreaded
Java Server
HTML, JSON, XML and images
over HTTP via Ajax
Views
Java Socket
KnetMiner System OverviewClient
23. Example 1
1. Search terms: "cell death" OR apoptosis
2. Open Query Suggestor
3. Click on cell death tab
4. Replace with neuron death
5. Does it change the number of
documents and genes that can be found?
Exercise 1 – Search Interface
Video 1
24. Gene View - Ranked genes and evidence summaries
25. 1. Uses TF*IDF to rank documents by their relevance to a search term
2. Uses the properties of gene-evidence networks such as
the specificity of documents to a gene
the frequency of evidence concepts
3. Calculates Knet-Score for every gene
Smart pre-indexing of the knowledge network makes the computation of
the score very fast
Gene Ranking
26. Network View – Interactive network visualization
• Enlarge
• Show all
• Re-layout
• Info Box
Add hidden nodes
and edges
27. Example 2
Exercise 2: Gene View Network
Search terms: Alzheimer OR Parkinson OR Heparin OR "cell death“
Gene List: APP, MAPT, PRNP
1. Click on the APP gene which loads the Network View
2. Open the Info Box
3. Click on different Concept and Relation types
4. Check their attributes and click on links to external databases
5. Explore all direct and indirect paths from APP to Alzheimer and
Parkinson
6. Hide Publication concepts
7. Show all drugs that can target the APP interaction network
8. Go back to Gene View and select Known targets
9. Click View Network and find out if APP, MAPT and PRNP
interact, are differentially expressed and have GWAS data
Video 2
28. Example 3
Exercise 3: Evidence View Network
Search terms: Alzheimer "cell death"
1. Go to Evidence View
2. Sort table by column GENES
3. Find GO concept downregulation of neuron death
4. Find GO concept upregulation of apoptosis
5. Click on number of genes linked to these terms
6. In Network View, show labels for Gene and GO concepts
7. What’s the evidence linking genes to selected GO terms?
Video 3
29. Map View – Interactive map of chromosome, gene, SNP and QTL data
• Show
network
• Enlarge
• Reset
• Settings
GWAS studies
30. Example 4
Exercise 4: Map View Network
Search terms: Parkinson "cell death“
QTL: Chromosome 12 :: 35000000 - 44000000
1. Go to Map View
2. Toggle Full Screen
3. Zoom into Chromosome 12 and find you QTL
4. Find one or several genes that are in close proximity to
GWAS SNPs
5. Select one ore more genes, e.g. LRRK2, PRNP and EIF4G1
6. Launch Network View
7. Study the network and how the genes are connected
Video 4
31. • Web application for very fast search of
large genome-scale knowledge graphs
• Ranking of candidate genes based on
knowledge mining
• Interactive visualisation of genome
and knowledge maps
• Facilitates hypothesis validation and
generation
KnetMiner – Making Gene Discovery Efficient & Fun
http://knetminer.rothamsted.ac.uk/
32. • You like KnetMiner but you might be asking…
What if I’m interested in a different disease?
What if I’m interested in a different species?
What if I want to integrate my own private data?
What if I don’t have a server to run KnetMiner?
• As part of a Innovate UK project we are working with Genestack to address these
qestions by integrating KnetMiner tools into the Genestack Bioinformatics Platform.
• Next: We will teach you how to use Genestack to build your own networks and deploy
your own KnetMiner application
Objectives for Part II
33. Acknowledgements
John Doonan
Sergio Feingold
Martin Castellote
Uwe Scholz
Matthias Lange
Andy Law
Keywan Hassani-Pak
Ajit Singh
Marco Brandizi
Monika Mistry
Lisa Lill
Chris Rawlings
Dave Edwards
Philipp Bayer
Misha Kapushesky
Kevin Dialdestoro
@KnetMiner
Hinweis der Redaktion
Many phenotypes are complex, polygenic and the result of complex interactions on cellular level
Linking genotype and phenotype is one of the greatest challenges in biology
SNP-Phenotype relations (122,919 relations) of significant SNPs (as defined by Ensembl, p-value<0.05?) linked to 107 phenotypes; on average 1,150 SNPs per phenotype.
SNP-Gene relations are based on genes in close proximity to SNPs <1000bp (96,047 relations)
How to integrate GWAS and biological interaction data