5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
Use of CEDAR Technology for Ontology-based Submission of Biomedical Data to the NCBI
1. Use of CEDAR Technology for Ontology-based
Submission of Biomedical Data to the NCBI
Syed Ahmad Chan Bukhari Ph.D., Kei-Hoi Cheung Ph.D., Steven H Kleinstein Ph.D.
Yale University
2. NCBI is an important resource to archive biomedical data
● NCBI hosts a collection of biomedical databases:
○ BioProject, BioSample, SRA, GenBank, GEO etc.
● Provide infrastructure to submit experimental data and associated metadata
● Minimal use of standard terminologies to define the necessary metadata
○ Ontologies recommended for some data elements (Not implemented)
● NCBI metadata are often described using inconsistent terminologies
○ Limit our ability to access, find, interoperate and reuse the data sets
Goal: Leverage CEDAR to improve NCBI metadata submissions
NCBI BioSample guideline suggests to use Disease Ontology terms
3. How are metadata currently submitted to NCBI?
BioProject
BioSample
Sequence Read Archive
Combination of web-based forms
and excel templates
● No mechanism to enforce standardized
vocabularies or ontology links
4. NCBI repositories need improved metadata
CEDAR maps components (e.g., entities, attributes, and value sets) to standard
ontologies that provide global definitions and machine-readable identifiers
Link to BRENDA Tissue and
Enzyme Source Ontology (BTO)
Link to Cell Ontology
Example NCBI BioSample Record
“B cell”, “B-cell” and “Bcell”
CEDAR-to-NCBI Solution
Link to Cell Ontology
Link to Disease Ontology
(for real-time validation)
Wrong location for info
Link to NCBI Taxonomy Ontology
5. Adaptive Immune-Receptor Repertoire (AIRR) Community
Next-generation sequencing of B & T cell receptor repertoires (AIRR-seq)
Developing standard protocols for reporting and sharing AIRR-seq data to
optimize their use in biomedical research and patient care
AIRR Working Groups
Minimal Standards
Tools and Resources
Common Repository
AIRR Community Formed
6. 1.
Study
Subject
Diagnosis
2.
Sample
Processing
3.
Nucleic Acid
Processing and
Sequencing
4.
Raw
Data
5.
Data
Processing
6.
Processed
Sequences with
Annotations
o Study title
o Study type
o Study inclusion/exclusion
criteria
o Grant funding agency
o Lab name
o Contact information
o Contact of person
uploading data
o Lab address
o Relevant publications
(identifiers)
o Subject ID
o Animal, human or
synthetic
o Sex
o Age
o Age event
o Ancestry population
o Ethnicity
o Race
o Species name
o Strain name
o Linked to other subject?
o Type of link
o Relevant Clinical History
o Study Group Description
o Disease(s)
o Disease stage
o Process type
o Immunogen/agent
o Biological sample ID
o Sample type
o Anatomic site/source
o Disease state of sample
o Sample collection time
(relative to T0)
o Collection time event (T0)
o Source (from commercial)
o Experiment Sample
o Tissue processing
o Cell isolation/enrichment
procedure
o Processing (sample)
o Cell subset
o Cell subset phenotype
o Single cell or bulk?
o How many cells in
experiment?
o Number of cells per
sequencing reaction
o Target substrate (DNA or
RNA)
o Library generation
method
o Library generation
protocol
o Target locus for PCR
o Forward PCR primer
location
o Reverse PCR primer
location
o Forward primer
sequences
o Reverse primer sequences
o Whole vs. partial
sequences
o Heavy vs. Light vs. paired
o Amount of template (ng)
o Total reads
o Total reads passing QC
o Calibrator and other
internal controls
o Total reads passing QC
o Protocol ID(s)
o Sequencing platform
o Read length(s)
o Sequencing facility
o Batch number
o Date of Sequencing run
o Sequencing kit
o File containing the raw
sequences
o Names of software tools
o Version numbers
o Paired read assembly
o Quality thresholds
o Primer match cut-offs
o Collapsing method
o Data processing protocols
(free text)
o V(D)J germline reference
database
o V gene
o D gene
o J gene
o CDR3 nucleotide
sequence
o CDR3 amino acid
sequence
o Read count
AIRR Community Data Elements
Each of the 6 high-level principles has been expanded into a set of data elements
Standard implemented @ NCBI
BioProject
BioSample
SRA
GenBank
Deposited at FAIRsharing.org:
https://fairsharing.org/bsg-s000689
8. CEDAR-AIRR-NCBI Metadata Generation
Data Submitter
NCBI CEDAR
Controlled Vocabularies
Predictive Entry
Interactive Metadata Entry
Metadata Findability
Metadata Accessibility
Metadata Interoperability
Metadata Reusability
represents limited features availability
Metadata submissions to NCBI BioProject, BioSample
and SRA are ontologically controlled and relationally
linked, which enables concept-based federated queries
across repositories that are silos otherwise.
dfgdfg
11. Acknowledgment
● National Institutes of Health through an NIH Big Data to Knowledge program
under grant U54AI117925.
● Ben Busby, NCBI
● Leila Rassi, SRA
● Tanya Barrett, GEO
● Kleinstein Lab
● Team CEDAR