A poster showing the current state of the UniProt sparql endpoint. People use this to run complicated analytical queries on the UniProt data. The freedom of the SPARQL language allows user to look at our data from non entry starting points e.g. disease. This is an additional service by the UniProt consortium.
We have comparable usage numbers as our FTP site usage. While usage numbers vary from month to month we do know it has a significant human component as weekend and holiday periods has significant lower usage. Some usage numbers are with green dot, here a robot/search engine might be a source of the higher usage numbers. However we are not sure at this time as it can also be users using cloud computers.
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
sparql.uniprot.org in production poster
1. 0
375
750
1125
1500
1875
2250
2625
3000
15
Usage of http://sparql.uniprot.org
See also:
www.isb-sib.org
See also:
www.isb-sib.org
See also:
www.sib.swiss
Contact
help@uniprot.org
www.uniprot.org
The UniProt SPARQL Endpoint: 34 Billion Triples in Production
Jerven Bolleman1, Sebastien Gehant1, Thierry Lombardot1, Alan Bridge1, Ioannis Xenarios1,2,3, Nicole Redaschi1,
and the UniProt Consortium1,4,5
1Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, rue Michel-Servet 1, 1211 Geneva 4, Switzerland, 2Vital-IT Group, SIB Swiss Institute of
Bioinformatics, Quartier Sorge, Bâtiment Génopode, 1015 Lausanne, Switzerland, 3University of Lausanne, 1015 Lausanne, Switzerland, 4European
Bioinformatics Institute (EBI), Wellcome Trust Genome Campus, Hinxton Cambridge CB10 1SD, UK, 5Protein Information Resource (PIR),
Georgetown University Medical Center, 3300 Whitehaven Street, NW, Suite 1200, Washington, DC 20007, USA
UniProt is mainly supported by the National Institutes of Health (NIH), National Human Genome Research Institute (NHGRI)
and National Institute of General Medical Sciences (NIGMS) grant U41HG007822. Additional support for the EBI's
involvement in UniProt comes from the NIH grant 2P41 HG02273. Swiss-Prot activities at the SIB are supported by the Swiss
Federal Government through the State Secretariat for Education, Research and Innovation SERI. PIR's UniProt activities are
also supported by the NIH grants 5R01GM080646-07, 3R01GM080646-07S1, 5G08LM010720-03, and 8P20GM103446-12,
and the National Science Foundation (NSF) grant DBI-1062520.
UniProt on the web
UniProt is a comprehensive resource for protein sequence
and annotation data. It has been available on the web since
its creation in 2002 (and its predecessors Swiss-Prot and
TrEMBL much longer...).
UniProt on the semantic web
All UniProt data is available in RDF since 2007 and can be
downloaded in this format from the UniProt FTP site and the
www.uniprot.org REST interface. Since 2014 you can also
query the data directly on our public SPARQL endpoint at
sparql.uniprot.org.
The UniProt data has grown eight fold over the last five
years. UniProt release 2017_11 consists of 34 billion triples
and requires just over 1.6TB of disk space when loaded in
Virtuoso 7.2., a columnar relational database that supports
SPARQL.
at your SERVICE
The SERVICE keyword allows you to run part of your query
on another SPARQL endpoint. For example you can
combine the UniProt and Ensembl endpoints to get the
coding exons for a protein.
select ?protein ?transcript ?exon ?order {
?protein rdfs:seeAlso ?transcript .
?transcript up:database database:Ensembl .
SERVICE <http://www.ebi.ac.uk/rdf/services/ensembl/sparql/>
{
?transcript obo:SO_translates_to ?peptide .
?peptide a ensemblterms:protein .
?transcript obo:SO_has_part ?exon ;
sio:SIO_000974 ?orderedPart .
?orderedPart sio:SIO_000628 ?exon ;
sio:SIO_000300 ?order .
}
}
Using http://sparql.uniprot.org
This website contains example queries with brief English
explanations. You can download query results in a number
of formats, including tab- or comma-separated for use in
Excel, R and other tools.
SPARQL: A graph query language
SPARQL is a standard for querying a graph database
and looks a little bit like SQL. It is optimised for pattern
matching and cross data source queries. There are more
than 40 compliant implementations of the latest version 1.1
recommendation.Hardware
Node 2
64 cpu cores
256 GB ram
8 TB consumer SSD
Node 1
64 cpu cores
256 GB ram
8 TB consumer SSD
Load Balancer = Apache mod_balancer
Many more
endpoints on the web
Triples: Simple sentences for complicated data
RDF uses (many) simple ‘sentences’ to describe
information. Each one consists of subject-predicate-object,
making it a triple. Example:
<http://sparql.uniprot.org/> rdfs:comment ‘a free API for you’
🏖🎄 🎄
SERVICE
SERVICE
SPARQL endpoints:
Communicating over HTTP
SERVICE
Your everyday tools:
Accessing endpoints over HTTP
SPARQL API
SPARQL API
SPARQLAPI
SPARQLAPI
Powered by Vital-IT
Powered by Vital-IT
ChEMBL
& more
🏖 🎄14 17🏖 🎄16