Diese Präsentation wurde erfolgreich gemeldet.
Die SlideShare-Präsentation wird heruntergeladen. ×

BioSHaRE: Making data useful without direct sharing: Cafe Variome and Omics browsing - Anthony Brookes - University of Leicester

Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Nächste SlideShare
PRC_2
PRC_2
Wird geladen in …3
×

Hier ansehen

1 von 26 Anzeige

BioSHaRE: Making data useful without direct sharing: Cafe Variome and Omics browsing - Anthony Brookes - University of Leicester

Herunterladen, um offline zu lesen

BioSHaRE conference July 28th, 2015, Milan - Latest tools and services for data sharing
Stream 1: Tools for data sharing analysis and enhancement

Café Variome is a highly flexible data discovery platform suitable for use with genomic data and/or phenotype data in settings such as diagnostic networks, disease consortia, biobanks and research communities. It enables users to search for the existence rather than the substance of datasets, and as part of this offers a complete suite of data discovery capabilities, focused on the data rather than metadata. Following data discovery, the system also facilitates controlled data sharing.

‘Café Variome Central’ aims to consolidate all publicly available genetic variants into one discovery portal through which to announce, discover and acquire a comprehensive listing of observed neutral and disease-causing gene variants. It employs publicly available web services to gather and make searchable a set of pointers to records of interest, to help users discover the existence of variant data and direct them to the original data sources where the data may be examined in full.

The software is in production as version 1.0 software, available presently for collaborative applications: http://www.cafevariome.org/

Café Variome can be installed stand-alone, or federated to allow searching across instances while the data remains at the source

OmicsConnect, underpinned by an ‘extended DAS’ (eDAS) protocol for data transfer, enables data feed into a genome browser tool from diverse sources and controlling which users should have access to which data sources and which data slices in those datasets.

DAS is a Extensible Markup Language (XML) communication protocol that allows a single client (e.g. a genome browser) to integrate information from multiple DAS servers dispersed around the world to present a unified view of data. The eDAS system brings many new advantages; the data are controlled by the content providers and can be modified, restricted and updated as required and the data are shared in a way that makes it easy for the end user to get information about specific regions, genes or markers without having to download and process entire datasets.

The latest version of OmicsConnect is
available for use under standard terms of academic collaboration:
http://omicsconnect.org

The tool is currently being improved for better adaptability and faster performance (fall 2015).

Contact info:
Prof. Anthony Brookes
University of Leicester
ajb97@leicester.ac.uk

Key words: genomics, genotype-phenotype, matchmaking, query-by-method apoi, rare disease, software

BioSHaRE conference July 28th, 2015, Milan - Latest tools and services for data sharing
Stream 1: Tools for data sharing analysis and enhancement

Café Variome is a highly flexible data discovery platform suitable for use with genomic data and/or phenotype data in settings such as diagnostic networks, disease consortia, biobanks and research communities. It enables users to search for the existence rather than the substance of datasets, and as part of this offers a complete suite of data discovery capabilities, focused on the data rather than metadata. Following data discovery, the system also facilitates controlled data sharing.

‘Café Variome Central’ aims to consolidate all publicly available genetic variants into one discovery portal through which to announce, discover and acquire a comprehensive listing of observed neutral and disease-causing gene variants. It employs publicly available web services to gather and make searchable a set of pointers to records of interest, to help users discover the existence of variant data and direct them to the original data sources where the data may be examined in full.

The software is in production as version 1.0 software, available presently for collaborative applications: http://www.cafevariome.org/

Café Variome can be installed stand-alone, or federated to allow searching across instances while the data remains at the source

OmicsConnect, underpinned by an ‘extended DAS’ (eDAS) protocol for data transfer, enables data feed into a genome browser tool from diverse sources and controlling which users should have access to which data sources and which data slices in those datasets.

DAS is a Extensible Markup Language (XML) communication protocol that allows a single client (e.g. a genome browser) to integrate information from multiple DAS servers dispersed around the world to present a unified view of data. The eDAS system brings many new advantages; the data are controlled by the content providers and can be modified, restricted and updated as required and the data are shared in a way that makes it easy for the end user to get information about specific regions, genes or markers without having to download and process entire datasets.

The latest version of OmicsConnect is
available for use under standard terms of academic collaboration:
http://omicsconnect.org

The tool is currently being improved for better adaptability and faster performance (fall 2015).

Contact info:
Prof. Anthony Brookes
University of Leicester
ajb97@leicester.ac.uk

Key words: genomics, genotype-phenotype, matchmaking, query-by-method apoi, rare disease, software

Anzeige
Anzeige

Weitere Verwandte Inhalte

Diashows für Sie (20)

Andere mochten auch (18)

Anzeige

Ähnlich wie BioSHaRE: Making data useful without direct sharing: Cafe Variome and Omics browsing - Anthony Brookes - University of Leicester (20)

Weitere von Lisette Giepmans (15)

Anzeige

BioSHaRE: Making data useful without direct sharing: Cafe Variome and Omics browsing - Anthony Brookes - University of Leicester

  1. 1. Making data useful without direct sharing: Cafe Variome and Omics browsing
  2. 2. • CANNOT: Data owners may not have time nor funding to manually submit data, and/or submission process and requirements too complicated • WILL NOT: Data owners receive little or no recognition or reward for releasing data, hence little incentive to try • MUST NOT: Data owners may have good reasons for not sharing data (ethical, legal, competitive edge) Issues that restrict sharing data
  3. 3. DATA SHARING IS IMPORTANT BUT DIFFICULT !
  4. 4. SO DO SOMETHING ELSE (AS WELL)
  5. 5. Share the ‘existence’ rather than the ‘substance’ of data This technology (or similar) sits atop/alongside existing local DBs to bring the discoverability and connectivity, without replacing or altering the local solutions
  6. 6. Use Cases/Collaborating Networks • Designed to be flexible for a number of use cases • Various groups using the tool in different ways – Rare disease (variant is the “entity”) – Patient centric (patient is the “entity”) – Aggregate frequency (i.e. mutation seen with a frequency of X in population
  7. 7. Café Variome Features • Café Variome is not a database but is a searchable 'menu’ • The platform enables data custodians to specify which users can search for, display counts of, or display details of, which subsets of records and record fields, using various search parameters • Results can be returned to users: - as core data - as links to data at source - by computationally facilitating data access requests
  8. 8. • Federated Café Variome network • Nodes populated with local data • Data discovery/sharing options under control of each source • Data remain at each source • Search interface enables real-time data/subject discovery • Each discovered record reported in one of 3 ways, dependant on users permissions and data source settings Open Access Restricted Access Linked Access Data provided Interface facilitates email to request data, followed by data supply if/when approved No data provided, only link to data source Source DB • Each source can control which data fields are searchable and which fields are (potentially) then returned
  9. 9. Simple Query Interface
  10. 10. Complex Query Interface
  11. 11. Query Builder in action
  12. 12. Controlled Display of Matched Record Counts & Data (if permitted)
  13. 13. Phenotype Semantics • Allow the phenotypic consequences of genetic entities to be described using public ontologies – Many terms from many ontologies can be associated with one entity • Allow the phenotypic consequences of genetic entities to be described using a local vocabulary or list • Enable hierarchical viewing and querying of the phenotype ontology data
  14. 14. Node Search Options • Searches are performed through one nominated head node • Searches can be performed from any node in the network External searches Head node Internal searches
  15. 15. Installation wizard Appearance preferences, content management system and statistics reporting Core system settings, defining displayed and searchable fields, bulk import template configuration Record and source management User, group and record access control management Multiple Admin Options
  16. 16. OmicsConnect: • Enables collaborators, and (optionally) additional researchers, to view/explore 'omics' datasets • Provides a mechanism for visual data discovery • Provides a unified browser view of ‘omics’ data • Separates data sharing into open, controlled and discoverable • Cope with different data sources and formats • Easy to setup and use
  17. 17. GWAS Central (www.gwascentral.org) - comprehensive genetic association database - aggregate data & extensive metadata - links to data sources for primary data
  18. 18. Eg. Visual meta-analysis: Compares and contrasts 8 different studies
  19. 19. The Browser
  20. 20. OmicsConnect browser Local Data Remote Data
  21. 21. Data Sources (DAS, GFF3, BED, wiggle BigWig, BigBed) Files FASTA GFF3 GTF Access Local Databases MySQL SQLite Access Online Resource's GWAS Central Ensembl UCSC Display Data Simple Interface Use new technologies Low Demands on Resources Platform Independent No dependencies OmicsConnect browser (Dalliance)
  22. 22. Track authentication DAS track enabled when passphrase entered DAS track not enabled as no passphrase entered
  23. 23. • Allows researchers to controllably serve their own omics data – Authentication (public/private) – User accounts • Can returns the available features for a specific file/genome segment • Intuitive interface for upload and management of data, including validation • Stylesheets: Instructions on how to format the data for viewing. • Additional feature implementations to the DAS protocol - ‘Types’: Returns what data types exist in the DAS track - ‘Summary’: Returns a summary of the data features per segment - ‘Search’: Returns features based on a keyword given by the user • Can be installed independently from OmicsConnect eDAS ‘gate keeper’
  24. 24. Other Genome Browsers Enhanced Distributed Annotation System (eDAS) Raw Files Local Databases Online Resource's Remote Access Local Access OmicsConnect Browser Online Resource's OmicsConnect & eDAS
  25. 25. Acknowledgements The research leading to these results has received funding from the EC under the 7th Framework Programme (FP7/2007-2013) grant agreements 261433 (BioSHaRE-EU) and 200754 (GEN2PHEN), and the IMI projects grants 115372 (EMIF) and 115736 (EPAD) Tim Beck Robert Hastings Charalambos Chrysostomou Robert Free Adam Webb Owen Lancaster Dhiwagaran Thangavelu Colin Veal Morris Swertz et al Alliance Consortium

Hinweis der Redaktion

  • One key question is to why are we developing a system such as this – well firstly…

    One major issue is that the personnel in diagnostic labs simple don’t have time nor funding to go on to the internet and manually submit their data to depositories such as Locus Specific databases (LSDBs)

    Secondly even if they were to do this they would receive no recognition or reward for releasing their data and hence they have little incentive to try (discussed more in Mummi’s section)

    Therefore Café Rouge aims to overcome these bottlenecks and facilitate the automated transfer of diagnostic laboratory data to the wider community
  • First of all I’ll be talking about Cafe Rouge which is a “backronym” for Cafe for Routine Genetic Data Exchange

    The reason for the name…?

    Well the concept was initially devised back in June at the Cambridge GAM by Tony and David in the Café Rouge restaurant and there is the receipt kept for posterities sake…

    So first of all lets look at what exactly Cafe Rouge aims to do...
  • Let’s say you need to understand thousands or even millions of rows of data or data points, and you have a short time to do it in.

    data may come from your lab, in which you’re already familiar with what it’s measuring and what the results are likely to be.

    Or it may come from another team, or maybe several teams at once, and be completely unfamiliar

    How can we make sense of the data efficicently and quickly?

    Data visualization may provide the answer

    visualization for exploring and visualization for explaining

    Visualization for exploring can be imprecise.

    It’s useful when you’re not exactly sure what the data has to tell you and you’re trying to get a sense of the relationships and patterns contained within it for the first time. It may take a while to figure out how to approach or clean the data, and which dimensions to include. Therefore, visualization for exploring is best done in such a way that it can be iterated quickly and experimented upon, so that you can find the signal within the noise.

    Visualization for explaining is best when it is cleanest. Here, the ability to pare down the information to its simplest form — to strip away the noise entirely — will increase the efficiency with which a decision maker can understand it. This is the approach to take once you understand what the data is telling you, and you want to communicate that to someone else. This is the kind of visualization you should be finding in those presentations and sales reports.
  • Data can be compared and contrasted from studies of interest, selecting resolution and scale of data
  • Collboration tool
    Publication
    Data discovery tool
    Why its unique
    aggreate data

    Why you use it
    Access control

×