2. Outline
• Introduction to H3Africa and H3ABioNet
• H3Africa data
• Data sharing policy
• Building infrastructure
• Computing infrastructure
• Human capacity
• Data harmonization & curation
• Facilitating data access
3. H3Africa: Human Heredity & Health in
Africa
• H3frica Vision: “To facilitate an Africa-based
contemporary research approach to the study of
genomics and environmental determinants of
common diseases with the goal of improving the
health of African populations”
• Funding: NIH, Wellcome Trust/AESA
4. The H3Africa Consortium
14
Collaborative
Centers
13 Research
Projects
3 Pilot
Biorepositories
8 Ethics Grants
The H3Africa
Consortium
Bioinformatics
Network
4 Global Health
Bioinformatics
Training Programs
H3ABioNet
5. H3ABioNet Informatics network
• H3ABioNet is a Pan African Informatics Network, to provide
bioinformatics infrastructure and support for the H3Africa
consortium
• Round 1: 34 partners in 14 African countries
• Round 2: 28 partners in 17 countries
• Activities:
• Infrastructure
• User support
• Research
• Training
www.h3abionet.org
6. H3Africa data (Phase I)
• Phenotype data (associated with genotype data)
– Demographic information
– Anthropometric data
– Disease and health related phenotype data
• Genetic Variation data human and pathogen
– Sequence data (whole genome, exome, targeted)
• Genotyping chip array data
– ~55,000 samples to be run on an H3Africa African custom chip
• Microbiome sequence data
– Patient/sample phenotypes
– Non-human 16S rRNA sequence data for microbiome
– Non-human full genome sequence data for microbiome
– Possible human sequence contamination
• Biospecimens to be deposited at the H3Africa biorepositories
Image credits: National Human Genome Research Institute (https://www.genome.gov/imagegallery/)
7. Why share data?
• New era of open science
• Enables reproducible science
• Increases visibility and credibility of data generators
• Additional publications and citations
• New research questions can be asked of data
• New discoveries made of relevance to participants
• Increasing sample size
• Increases value of the data
• Funder requirement
8. Limits to sharing human genetic data
• Data can be stored indefinitely, biobank
specimens can be stored for up to 20 years –
secondary use -rapid innovation with ‘omics
technologies
• Blood sample collection and visits to clinics
associated with disease and treatment – even if a
healthy control
• Ethics consent: H3Africa- some projects have
broad consent, some used tiered consent or
specific consent
• History of vulnerable populations, low education
levels and exploitation
• Anonymized, but risk of identification
Ethical
considerations
Informed
consent
Participant
identification
Stigmatisation
Benefit
sharing
9. Human genetic data privacy
• Age & Sex
• Country of birth
• Current residence
• Native language
• Ethno-linguistic/tribal affiliation
• Country of birth of father and mother
• Native language of father and mother
• Ethno-linguistic/tribal affiliation of
mother and father
• Height
• Weight
• Current medications
• Smoking history
• Alcohol history
Image credits: National Human Genome Research Institute (https://www.genome.gov/imagegallery/)
• Combination of phenotype and genetic data makes it possible to
identify different populations and individuals – restricted access
10. H3Africa Data Sharing Access and
Release Policy
• Balance between ensuring that adequate safeguards to protect
participants while not being a barrier for scientists to advance
research
• Maximizing the availability of research data, in a timely and
responsible manner
• Protecting the rights and privacy of human subjects who
participated in research studies
• Recognizing the scientific contribution of researchers who
generated the data
• Considering the nature and ethics of the research proposed in
establishing the timely release of data, and mechanisms of data
sharing
• Promoting deposition of genomic data in existing community data
repositories whenever possible
11. H3Africa DSAR policy
• For genomic and phenotype data:
• Submit to H3Africa archive
• 9 months to submit to public repository
• 12 month publication embargo
• In EGA access controlled by DBAC
2 months
Research
site- QC
genomic &
phenotypic
data
9 months
H3ABioNet-
Genomic &
phenotypic
data stored
12 months
EGA- Genomic &
phenotypic data
available through
DBAC with publication
embargo
Long term
EGA- Genomic & phenotypic data
available through DBAC without
publication embargo
Research
site -Data
generation
23 months
12. Data and Biospecimen Access Committee
• Review and approve requests for data and/or biospecimens
• Biospecimens:
• first 3 years only access outside H3Africa for those collaborating in
Africa
• Use info on availability in biobanks
• Data generated must be submitted to EGA
• Scientific review/funding available
• Data
• DBAC will ensure requestor has expertise and resources
• Scientific review
• Evaluation criteria
• Scientific merit
• Institutional capacity for the research
• Potential for publication or translation, e.g. new therapies
13. Data access agreement
• H3Africa not liable for use of data
• Only use data for agreed purpose
• Maintain data confidentiality
• Make sure data is secure
• Acknowledge source of data
• Submit annual reports
• Project put onto website
• Access is granted for 1 year
14. What is required for sharing data?
• Consent from participants –varying consent within a study
is difficult
• Robust data sharing model with implementation strategy
for data access, transfer, etc
• Access agreements and MoUs
• Infrastructure for
• Data transfer
• Data storage & compute
• Training
• Data curation and harmonization
15. Infrastructure development & support
• Node server purchases
• Sys Admin “How to” documents
• Access to HPC, Cloud (Docker
containers)
• Internet connectivity
measurement -NetMap
• Data transfer –Globus online,
testing vs Aspera
• Data storage
• Training in IT, data management
and general bioinformatics use
H3ABioNet combined equipment: 512
cores, 2384 GB RAM, 120TB storage
16. Building human capacity for genomics
data management
• Need to train
• Bioinformaticians
• Data scientists
• Bioinformatics users
• Medical professionals
Specialised courses,
shadow teams,
internships
ISCB
EMBL-EBI
training team
17. Training Approaches
Face to face Workshops
Train-the-Trainer
Internships
Live Online Training
Hackathons/Data Jamborees
Access to training materials
19. Harmonizing H3Africa data
Mapping biobank data to
OMIABIS ontology
Mapping CRFs to ontologies,
e.g. phenotype or disease
ontology
Mapping genomics
data to
Experimental
Factor ontology
PHWG has developed
set of core phenotypes,
standard CRF
Mapping ethics
consent info to Data
Use ontology
20. Harmonizing H3Africa data
Mapping biobank data to
OMIABIS ontology
Mapping CRFs to ontologies,
e.g. phenotype or disease
ontology
Mapping genomics
data to
Experimental
Factor ontology
PHWG has developed
set of core phenotypes,
standard CRF
Mapping ethics
consent info to Data
Use ontology
Biorepositories
Archive & EGA
Catalogue
21. Making data FAIR
• Findable, Accessible, Interoperable, and Re-usable
https://www.force11.org/group/fairgroup/fairprinciples
• To be Findable: identifier, metadata, indexed
• To be Accessible: find by identifier, clear rules for
access and authentication
• To be Interoperable: standardized and cross-
referenced
• To be Reusable: licensed, metadata with provenance,
standards
22. Making data FAIR
• Findable, Accessible, Interoperable, and Re-usable
https://www.force11.org/group/fairgroup/fairprinciples
• To be Findable: identifier, metadata, indexed
• To be Accessible: find by identifier, clear rules for
access and authentication
• To be Interoperable: standardized and cross-
referenced
• To be Reusable: licensed, metadata with provenance,
standards
23. H3Africa Data Archive
• Assist H3Africa projects as data coordination center:
TransferValidate
Store
Submit
to EGA
Obtain EGA accessions
for publications
0.5 petabytes storage size including offsite
replication
Local EGA feasibility?
25. Beacons
…a simple public web service … designed
merely to accept a query of the form "Do you
have any genomes with an 'A' at position
100,735 on chromosome 3" (or similar data)
and responds with one of "Yes" or "No."
genomicsandhealth.org
• Advantages
• Locally hosted
• Minimal information (yes/no for a
given allele)
• Protection against “scraping”
https://goo.gl/Bkd0dx
26. Summary
• H3Africa is largest collection of human biomedical
data in Africa to date
• Human data is sensitive and needs to be shared
while protecting participants and researchers
• Need to build infrastructure for sharing:
• harmonized/curated metadata
• storage and transfer facilities
• human capacity -skills
• Need to provide access tools –web interface, public
repositories, database
• Trying to promote Open science –user groups,
sessions
27. Acknowledgements
The H3ABioNet Consortium
Funding: NIH
Common Fund,
NGHRI grant:
U41HG006941,
U24HG006941
H3ABioNet team at CBIO:
• Sumir Panji
• Gerrit Botha
• Ayton Meintjes
• Suresh Maslamoney
• Vicky Nembaware
• Ziyaad Parker
• Kim Gurwitz
• Mamana Mbiyavanga
• Katherine Johnston
Slides: Sumir
Panji, Michelle
Skelton