Top Quality Call Girl Service Kalyanpur 6378878445 Available Call Girls Any Time
Studying the elusive in larger scale
1. Studying the elusive environment in larger
scale with the exposome and EWAS
Chirag J Patel
Boston University
11/30/15
chirag@hms.harvard.edu
@chiragjp
www.chiragjpgroup.org
2. P = G + EType 2 Diabetes
Cancer
Alzheimer’s
Gene expression
Phenotype Genome
Variants
Environment
Infectious agents
Nutrients
Pollutants
Drugs
3. We are great at G investigation!
over 2000
Genome-wide Association Studies (GWAS)
https://www.ebi.ac.uk/gwas/
G
4. Nothing comparable to elucidate E influence!
We lack high-throughput methods
and data to discover new E in P…
E: ???
7. σ2
G
σ2
P
H2 =
Heritability (H2) is the range of phenotypic variability
attributed to genetic variability in a population
Indicator of the proportion of phenotypic
differences attributed to G.
8. Eye color
Hair curliness
Type-1 diabetes
Height
Schizophrenia
Epilepsy
Graves' disease
Celiac disease
Polycystic ovary syndrome
Attention deficit hyperactivity disorder
Bipolar disorder
Obesity
Alzheimer's disease
Anorexia nervosa
Psoriasis
Bone mineral density
Menarche, age at
Nicotine dependence
Sexual orientation
Alcoholism
Lupus
Rheumatoid arthritis
Crohn's disease
Migraine
Thyroid cancer
Autism
Blood pressure, diastolic
Body mass index
Depression
Coronary artery disease
Insomnia
Menopause, age at
Heart disease
Prostate cancer
QT interval
Breast cancer
Ovarian cancer
Hangover
Stroke
Asthma
Blood pressure, systolic
Hypertension
Osteoarthritis
Parkinson's disease
Longevity
Type-2 diabetes
Gallstone disease
Testicular cancer
Cervical cancer
Sciatica
Bladder cancer
Colon cancer
Lung cancer
Leukemia
Stomach cancer
0 25 50 75 100
Heritability: Var(G)/Var(Phenotype) SNPedia.com
G estimates for complex traits are low and variable:
massive opportunity for high-throughput E discovery
Type 2 Diabetes (25%)
Heart Disease (25-30%)
Autism (50%???)
9. Eye color
Hair curliness
Type-1 diabetes
Height
Schizophrenia
Epilepsy
Graves' disease
Celiac disease
Polycystic ovary syndrome
Attention deficit hyperactivity disorder
Bipolar disorder
Obesity
Alzheimer's disease
Anorexia nervosa
Psoriasis
Bone mineral density
Menarche, age at
Nicotine dependence
Sexual orientation
Alcoholism
Lupus
Rheumatoid arthritis
Crohn's disease
Migraine
Thyroid cancer
Autism
Blood pressure, diastolic
Body mass index
Depression
Coronary artery disease
Insomnia
Menopause, age at
Heart disease
Prostate cancer
QT interval
Breast cancer
Ovarian cancer
Hangover
Stroke
Asthma
Blood pressure, systolic
Hypertension
Osteoarthritis
Parkinson's disease
Longevity
Type-2 diabetes
Gallstone disease
Testicular cancer
Cervical cancer
Sciatica
Bladder cancer
Colon cancer
Lung cancer
Leukemia
Stomach cancer
0 25 50 75 100
Heritability: Var(G)/Var(Phenotype) SNPedia.com
G estimates for complex traits are low and variable:
massive opportunity for high-throughput E discovery
σ2
E : Exposome!
11. Explaining the other 50%:
A new data-driven paradigm for robust discovery of
via EWAS and the exposome
what to measure? how to measure?
PERSPECTIVES
Xenobiotics
Inflammation
Preexisting disease
Lipid peroxidation
Oxidative stress
Gut flora
Internal
chemical
environment
Externalenvironment
ExposomeRADIATION
DIET
POLLUTION
INFECTIONS
DRUGS
LIFE-STYLE
STRESS
Reactive electrophiles
Metals
Endocrine disrupters
Immune modulators
Receptor-binding proteins
itical entity for disease eti-
ogy (7). Recent discussion
as focused on whether and
ow to implement this vision
8). Although fully charac-
rizing human exposomes
daunting, strategies can be
eveloped for getting “snap-
hots” of critical portions of
person’s exposome during
ifferent stages of life. At
ne extreme is a “bottom-up”
rategy in which all chemi-
als in each external source
f a subject’s exposome are
easured at each time point.
lthoughthisapproachwould
ave the advantage of relat-
g important exposures to
e air, water, or diet, it would
quire enormous effort and
ould miss essential compo-
ents of the internal chemi-
al environment due to such
actors as gender, obesity,
flammation, and stress. By
ontrast, a “top-down” strat-
gy would measure all chem-
als (or products of their
ownstream processing or
ffects, so-called read-outs
r signatures) in a subject’s
ood. This would require
nly a single blood specimen
each time point and would relate directly ruptors and can be measured through serum
some (telomere) length in
peripheral blood mono-
nuclear cells responded
to chronic psychological
stress, possibly mediated
by the production of reac-
tive oxygen species (15).
Characterizing the
exposome represents a tech-
nological challenge like that of
thehumangenomeproject,which
began when DNA sequencing
was in its infancy (16). Analyti-
cal systems are needed to pro-
cess small amounts of blood from
thousands of subjects. Assays
should be multiplexed for mea-
suring many chemicals in each
class of interest. Tandem mass
spectrometry, gene and protein
chips, and microfluidic systems
offer the means to do this. Plat-
forms for high-throughput assays
shouldleadtoeconomiesofscale,
again like those experienced by
the human genome project. And
because exposome technologies
would provide feedback for thera-
peuticinterventionsandpersonal-
ized medicine, they should moti-
vate the development of commer-
cial devices for screening impor-
tant environmental exposures in
blood samples.
With successful characterization of both
Characterizing the exposome. The exposome represents
the combined exposures from all sources that reach the
internal chemical environment. Toxicologically important
classes of exposome chemicals are shown. Signatures and
biomarkers can detect these agents in blood or serum.
onOctober21,2010www.sciencemag.orgrom
“A more comprehensive view of
environmental exposure is
needed ... to discover major
causes of diseases...”
how to analyze in relation to health?
Wild, 2005
Rappaport and Smith, 2010, 2011
Buck-Louis and Sundaram 2012
Miller and Jones, 2014
Patel CJ and Ioannidis JPAI, 2014
12. Connecting Environmental Exposure with Disease:
Missing the “System” of Exposures?
E+ E-
diseased
non-
diseased
?
Exposed to many things, but do not assess the multiplicity.
Fragmented literature of associations.
Challenge to discover E associated with disease.
14. Gold standard for breadth of human exposure information:
National Health and Nutrition Examination Survey1
since the 1960s
now biannual: 1999 onwards
10,000 participants per survey
The sample for the survey is selected to represent
the U.S. population of all ages. To produce reli-
able statistics, NHANES over-samples persons 60
and older, African Americans, and Hispanics.
Since the United States has experienced dramatic
growth in the number of older people during this
century, the aging population has major impli-
cations for health care needs, public policy, and
research priorities. NCHS is working with public
health agencies to increase the knowledge of the
health status of older Americans. NHANES has a
primary role in this endeavor.
All participants visit the physician. Dietary inter-
views and body measurements are included for
everyone. All but the very young have a blood
sample taken and will have a dental screening.
Depending upon the age of the participant, the
rest of the examination includes tests and proce-
dures to assess the various aspects of health listed
above. In general, the older the individual, the
more extensive the examination.
Survey Operations
Health interviews are conducted in respondents’
homes. Health measurements are performed in
specially-designed and equipped mobile centers,
which travel to locations throughout the country.
The study team consists of a physician, medical
and health technicians, as well as dietary and health
interviewers. Many of the study staff are
bilingual (English/Spanish).
An advanced computer system using high-
end servers, desktop PCs, and wide-area
networking collect and process all of the
NHANES data, nearly eliminating the need
for paper forms and manual coding operations.
This system allows interviewers to use note-
book computers with electronic pens. The staff
at the mobile center can automatically transmit
data into data bases through such devices as
digital scales and stadiometers. Touch-sensi-
tive computer screens let respondents enter
their own responses to certain sensitive ques-
tions in complete privacy. Survey information
is available to NCHS staff within 24 hours of
collection, which enhances the capability of
collecting quality data and increases the speed
with which results are released to the public.
In each location, local health and government
officials are notified of the upcoming survey.
Households in the study area receive a letter
from the NCHS Director to introduce the
survey. Local media may feature stories about
the survey.
NHANES is designed to facilitate and en-
courage participation. Transportation is provided
to and from the mobile center if necessary.
Participants receive compensation and a report
of medical findings is given to each participant.
All information collected in the survey is kept
strictly confidential. Privacy is protected by
public laws.
Uses of the Data
Information from NHANES is made available
through an extensive series of publications and
articles in scientific and technical journals. For
data users and researchers throughout the world,
survey data are available on the internet and on
easy-to-use CD-ROMs.
Research organizations, universities, health
care providers, and educators benefit from
survey information. Primary data users are
federal agencies that collaborated in the de-
sign and development of the survey. The
National Institutes of Health, the Food and
Drug Administration, and CDC are among the
agencies that rely upon NHANES to provide
data essential for the implementation and
evaluation of program activities. The U.S.
Department of Agriculture and NCHS coop-
erate in planning and reporting dietary and
nutrition information from the survey.
NHANES’ partnership with the U.S. Environ-
mental Protection Agency allows continued
study of the many important environmental
influences on our health.
• Physical fitness and physical functioning
• Reproductive history and sexual behavior
• Respiratory disease (asthma, chronic bron-
chitis, emphysema)
• Sexually transmitted diseases
• Vision
1 http://www.cdc.gov/nchs/nhanes.htm
>250 exposures (serum + urine)
GWAS chip
>85 quantitative clinical traits
(e.g., serum glucose, lipids, BMI)
Death index linkage (cause of
death)
15. Gold standard for breadth of human exposure information:
National Health and Nutrition Examination Survey
Nutrients and Vitamins
vitamin D, carotenes
Infectious Agents
hepatitis, HIV, Staph. aureus
Plastics and consumables
phthalates, bisphenol A
Physical Activity
stepsPesticides and pollutants
atrazine; cadmium; hydrocarbons
Drugs
statins; aspirin
18. What E factors are associated with
mortality and biological aging?
19. EWAS to search for
exposures and behaviors associated with all-cause mortality.
NHANES: 1999-2004
National Death Index linked mortality
246 behaviors and exposures (serum/urine/self-report)
NHANES: 1999-2001
N=330 to 6008 (26 to 655 deaths)
~5.5 years of followup
Cox proportional hazards
baseline exposure and time to death
False discovery rate < 5%
NHANES: 2003-2004
N=177 to 3258 (20-202 deaths)
~2.8 years of followup
p < 0.05
IJE, 2013
21. Adjusted Hazard Ratio
-log10(pvalue)
0.4 0.6 0.8 1.0 1.2 1.4 1.6 2.0 2.4 2.8
02468
1
2
3
4
5
67
1 Physical Activity
2 Does anyone smoke in home?
3 Cadmium
4 Cadmium, urine
5 Past smoker
6 Current smoker
7 trans-lycopene
(11) 1
2
3 4
5 6
78
9
10 1112
13 14
1516
1 age (10 year increment)
2 SES_1
3 male
4 SES_0
5 black
6 SES_2
7 SES_3
8 education_hs
9 other_eth
10 mexican
11 occupation_blue_semi
12 education_less_hs
13 occupation_never
14 occupation_blue_high
15 occupation_white_semi
16 other_hispanic
(69)
EWAS (re)-identifies factors associated with all-cause mortality:
Volcano plot of 200 associations
age (10 years)
income (quintile 2)
income (quintile 1)
male
black income (quintile 3)
any one smoke in home?
age, sex, income, education, race/ethnicity, occupation [in red]
serum and urine cadmium
[1 SD]
past smoker?
current smoker?serum lycopene
[1SD]
physical activity
[low, moderate, high activity]*
*derived from METs per activity and categorized by Health.gov guidelines
R2 ~ 2%
22. 452 associations in Telomere Length:
Polychlorinated biphenyls associated with longer telomeres?!
Manrai, Kohane (in review)
0
1
2
3
4
−0.2 −0.1 0.0 0.1 0.2
effect size
−log10(pvalue)
PCBs
FDR<5%
Trunk Fat
Alk. PhosCRP
Cadmium
Cadmium (urine)cigs per day
retinyl stearate
R2 ~ 1%
VO2 Maxpulse rate
shorter telomeres longer telomeres
adjusted by age, age2, race, poverty, education, occupation
median N=3000; 300-7000
23. Interindividual variation in mean leukocyte telomere length
(LTL) is associated with cancer and several age-associated
diseases. We report here a genome-wide meta-analysis of
37,684 individuals with replication of selected variants in
an additional 10,739 individuals. We identified seven loci,
including five new loci, associated with mean LTL (P < 5 × 10−8).
Five of the loci contain candidate genes (TERC, TERT, NAF1,
OBFC1 and RTEL1) that are known to be involved in telomere
biology. Lead SNPs at two loci (TERC and TERT) associate
with several cancers and other diseases, including idiopathic
pulmonary fibrosis. Moreover, a genetic risk score analysis
combining lead variants at all 7 loci in 22,233 coronary
artery disease cases and 64,762 controls showed an
association of the alleles associated with shorter LTL with
increased risk of coronary artery disease (21% (95%
confidence interval, 5–35%) per standard deviation
in LTL, P = 0.014). Our findings support a causal role of
telomere-length variation in some age-related diseases.
Telomeres are the protein-bound DNA repeat structures at the ends
of chromosomes that are important in maintaining genomic sta-
bility1. They are critical in regulating cellular replicative capacity2.
During somatic-cell replication, telomere length progressively short-
ens because of the inability of DNA polymerase to fully replicate the
3 end of the DNA strand. Once a critically short telomere length is
reached, the cell is triggered to enter replicative senescence, which
subsequently leads to cell death1,2. Conversely, in germ cells and
other stem cells that require renewal, telomere length is maintained
by the enzyme telomerase, a ribonucleoprotein that contains the
RNA template TERC and a reverse transcriptase TERT3. Both longer
and shorter telomere length are associated with increased risk of
certain cancers4,5, and reactivation of telomerase, which bypasses
cellular senescence, is a common requirement for oncogenic pro-
gression6. Therefore, telomere length is an important determinant
of telomere function.
Mean telomere length exhibits considerable interindividual vari-
ability and has high heritability with estimates varying between 44%
and 80% (refs. 7–9). Most of these studies have measured mean
telomere length in blood leukocytes. However, there is evidence that,
within an individual, mean LTL and telomere length in other tissues
are highly correlated10,11. In cross-sectional population studies, mean
LTL is longer in women than in men and is inversely associated with
age (declining by between 20–40 bp per year)9,12–14. Shorter age-
adjusted and sex-adjusted mean LTL has been found to be associated
with risk of several age-related diseases, including coronary artery
disease (CAD)12–15, and has been advanced as a marker of biologi-
cal aging16. However, the extent to which the association of shorter
LTL with age-related disorders is causal in nature remains unclear.
Identifying genetic variants that affect telomere length and testing
their association with disease could clarify any causal role.
So far, common variants at two loci on chromosome 3q26
(TERC)17–19 and chromosome 10q24.33 (OBFC1)18, which explain
<1% of the variance in telomere length, have shown a replicated asso-
ciation with mean LTL in genome-wide association studies (GWAS).
To identify other genetic determinants of LTL, we conducted a large-
scale GWAS meta-analysis of 37,684 individuals from 15 cohorts,
followed by replication of selected variants in an additional 10,739
individuals from 6 more cohorts.
Details of the studies included in the GWAS meta-analysis and in
the replication phase are provided in the Supplementary Note, and
key characteristics are summarized in Supplementary Table 1. All
subjects were of European descent, the majority of the cohorts were
population based and three of the replication cohorts were addi-
tional subjects from studies used in the meta-analysis. The genotyp-
ing platforms and the imputation method (to HapMap 2 build 36)
used by each GWAS cohort are summarized in Supplementary
Table 2. We measured mean LTL in each cohort using a quantitative
PCR method and expressed it as a ratio of telomere repeat length to
copy number of a single-copy gene (T/S ratio; Online Methods and
Supplementary Note).
Then we analyzed LTL, adjusted for age, sex and any study-specific
covariates, for association with genotype using linear regression in
each study and adjusted the results for genomic inflation control fac-
tors (Supplementary Table 2). We performed an inverse variance–
weighted meta-analysis for 2,362,330 SNPs (Online Methods)
with correction for the overall genomic inflation control factor
( = 1.007; quantile-quantile plot for the meta-analysis is shown in
Supplementary Fig. 1).
SNPs in seven loci exhibited association with mean LTL at genome-
wide significance (P < 5 × 10−8; Figs. 1, 2, Table 1 and Supplementary
Fig. 2). The association of the lead SNP on chromosome 2p16.2
(rs11125529) was very close to the threshold for genome-wide sig-
nificance, and the lead SNP in a locus on 16q23.3 (rs2967374) fell just
short of this threshold (Table 1). We therefore sought replication of
results for these two loci. We confirmed the association of rs11125529
Identification of seven loci affecting mean telomere
length and their association with disease
A full list of authors and affiliations appears at the end of the paper.
Received 26 June 2012; accepted 19 December 2012; published online 27 March 2013; doi:10.1038/ng.2528
Nature Genetics, 2013
Interindividual variation in mean leukocyte telomere length
(LTL) is associated with cancer and several age-associated
diseases. We report here a genome-wide meta-analysis of
37,684 individuals with replication of selected variants in
an additional 10,739 individuals. We identified seven loci,
including five new loci, associated with mean LTL (P < 5 × 10−8).
Five of the loci contain candidate genes (TERC, TERT, NAF1,
OBFC1 and RTEL1) that are known to be involved in telomere
biology. Lead SNPs at two loci (TERC and TERT) associate
with several cancers and other diseases, including idiopathic
pulmonary fibrosis. Moreover, a genetic risk score analysis
combining lead variants at all 7 loci in 22,233 coronary
artery disease cases and 64,762 controls showed an
association of the alleles associated with shorter LTL with
increased risk of coronary artery disease (21% (95%
confidence interval, 5–35%) per standard deviation
in LTL, P = 0.014). Our findings support a causal role of
telomere-length variation in some age-related diseases.
Telomeres are the protein-bound DNA repeat structures at the ends
of chromosomes that are important in maintaining genomic sta-
bility1. They are critical in regulating cellular replicative capacity2.
During somatic-cell replication, telomere length progressively short-
ens because of the inability of DNA polymerase to fully replicate the
3 end of the DNA strand. Once a critically short telomere length is
reached, the cell is triggered to enter replicative senescence, which
subsequently leads to cell death1,2. Conversely, in germ cells and
other stem cells that require renewal, telomere length is maintained
age (declining by between 20–40 bp per year)9,12–14. Shorter age-
adjusted and sex-adjusted mean LTL has been found to be associated
with risk of several age-related diseases, including coronary artery
disease (CAD)12–15, and has been advanced as a marker of biologi-
cal aging16. However, the extent to which the association of shorter
LTL with age-related disorders is causal in nature remains unclear.
Identifying genetic variants that affect telomere length and testing
their association with disease could clarify any causal role.
So far, common variants at two loci on chromosome 3q26
(TERC)17–19 and chromosome 10q24.33 (OBFC1)18, which explain
<1% of the variance in telomere length, have shown a replicated asso-
ciation with mean LTL in genome-wide association studies (GWAS).
To identify other genetic determinants of LTL, we conducted a large-
scale GWAS meta-analysis of 37,684 individuals from 15 cohorts,
followed by replication of selected variants in an additional 10,739
individuals from 6 more cohorts.
Details of the studies included in the GWAS meta-analysis and in
the replication phase are provided in the Supplementary Note, and
key characteristics are summarized in Supplementary Table 1. All
subjects were of European descent, the majority of the cohorts were
population based and three of the replication cohorts were addi-
tional subjects from studies used in the meta-analysis. The genotyp-
ing platforms and the imputation method (to HapMap 2 build 36)
used by each GWAS cohort are summarized in Supplementary
Table 2. We measured mean LTL in each cohort using a quantitative
PCR method and expressed it as a ratio of telomere repeat length to
copy number of a single-copy gene (T/S ratio; Online Methods and
Supplementary Note).
Then we analyzed LTL, adjusted for age, sex and any study-specific
Identification of seven loci affecting mean telomere
length and their association with disease
Does PCB exposure influence expression of 24 (29) genes
implicated in telomere length GWAS?
L E T T E R S
but not of rs2967374 (Table 1). The com-
bined P value from the GWAS meta-analyses
and replication cohorts for rs11125529 was
7.50 × 10−10. There was no evidence of sex-
dependent effects or additional independent
signals at any of these loci (Online Methods
and Supplementary Tables 3, 4).
Details of key genes in each locus associated with LTL and their
location in relation to the lead SNP are provided in Supplementary
Table 5. The most significantly associated locus we found was the
previously reported TERC locus on 3q26 (Figs. 1, 2 and Table 1)17.
Four additional loci, 5p15.33 (TERT), 4q32.2 (NAF1, nuclear assembly
factor 1), 10q24.33 (OBFC1, oligonucleotide/oligosaccharide-binding
fold containing 1)18 and 20q13.3 (RTEL1, regulator of telomere elon-
gation helicase 1), harbor genes that encode proteins with known
function in telomere biology3,20–23. NAF1 protein is required for
assembly of H/ACA box small nucleolar RNA, the RNA family to
which TERC belongs20. Thus, the three most significantly associated
loci (3q26, 5p15.33 and 4q32.2) harbor genes involved in the forma-
tion and activity of telomerase. We therefore examined whether the
lead SNPs at these loci as well as the other identified loci associate with
leukocyte telomerase activity in available data from 208 individuals.
We did not find an association of any of the variants with telomerase
activity (Supplementary Table 6). However, the study only had 80%
power ( of 0.05) to detect a SNP effect that explained 3.7% of the
variance in telomerase activity, and therefore smaller effects are likely
to have been missed in this exploratory analysis.
We also found a significant association (P = 6.90 × 10−11) at the
previously reported OBFC1 locus18. OBFC1 is a component of the
telomere-binding CST complex that also contains CTC1 and TEN1
(ref. 21). In yeast, this complex binds to the single-stranded gua-
nine overhang at the telomere and functions to promote telomere
replication. RTEL1 is a DNA helicase that has been shown to have
important roles in setting telomere length, telomere maintenance
and DNA repair in mice22,23. However, it should be noted that the
Figure 1 Signal-intensity plot of genotype
association with telomere length. Data
are displayed as –log10(P values) against
chromosomal location for the 2,362,330 SNPs
that were tested. The dotted line represents a
genome-wide level of significance at P = 5 × 10−8.
Loci that showed an association at this level are
plotted in red.
a 35
30
25
20
value)
r
2
0.8
0.6
0.4
0.2
rs10936599 100
Recombination
80
60
b
0.8
0.6
0.4
0.2
20
15
lue)
rs2736100 100
80
r
2
Recombinat
c
0.8
0.6
0.4
0.2
15
)
rs7675998 100
80
r
2
Recombina
30
20
–log10(Pvalue)
10
ACYP2
NAF1
TERT
Chromosome
OBFC1
ZNF208
RTEL1
TERC
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
24. Samples exposed to PCBs associated with difference in genes
implicated in telomere length GWAS?
Expression differences for 24 GWAS implicated genes
Queried the Gene Expression Omnibus for PCBs
Affymetrix human arrays (GPL570)
7 gene expression experiments on humans
52 exposed; 14 unexposed
Differential gene expression and a functional analysis of PCB-exposed children:
Understanding disease and disorder development
Sisir K. Dutta a,
⁎, Partha S. Mitra a,1
, Somiranjan Ghosh a,1
, Shizhu Zang a,1
, Dean Sonneborn b
,
Irva Hertz-Picciotto b
, Tomas Trnovec c
, Lubica Palkovicova c
, Eva Sovcikova c
,
Svetlana Ghimbovschi d
, Eric P. Hoffman d
a
Molecular Genetics Laboratory, Howard University, Washington, DC, USA
b
Department of Public Health Sciences, University of California Davis, Davis, CA, USA
c
Slovak Medical University, Bratislava, Slovak Republic
d
Center for Genetic Medicine, Children's National Medical Center, Washington, DC, USA
a b s t r a c ta r t i c l e i n f o
Article history:
Received 20 December 2010
Accepted 10 July 2011
The goal of the present study is to understand the probable molecular mechanism of toxicities and the
associated pathways related to observed pathophysiology in high PCB-exposed populations. We have
performed a microarray-based differential gene expression analysis of children (mean age 46.1 months) of
Environment International 40 (2012) 143–154
Contents lists available at ScienceDirect
Environment International
journal homepage: www.elsevier.com/locate/envint
25. 0
1
2
−0.50 −0.25 0.00 0.25 0.50 0.75
log(difference)
−log10(pvalue)
1555203_s_at (SLC44A4)
1555203_s_at (MYNN)
224206_x_at (MYNN)
Samples exposed to PCBs associated with difference in genes
implicated in telomere length GWAS?
26. Interdependencies of the exposome:
Correlation globes paint a complex view of exposure
Red: positive ρ
Blue: negative ρ
thickness: |ρ|
permuted data to produce
“null ρ”
sought replication in > 1
cohort
Pac Symp Biocomput 2015
JECH 2015
for each pair of E:
Spearman ρ
(575 factors: 81,937 correlations)
27. Red: positive ρ
Blue: negative ρ
thickness: |ρ|
Interdependencies of the exposome:
Correlation globes paint a complex view of exposure
permuted data to produce
“null ρ”
sought replication in > 1
cohort
Pac Symp Biocomput 2015
JECH 2015
Effective number of
variables:
500 (10% decrease)
for each pair of E:
Spearman ρ
(575 factors: 81,937 correlations)
28. Telomere Length All-cause mortality
http://bit.ly/globebrowse
Interdependencies of the exposome:
Telomeres vs. all-cause mortality
29. Browse these and 82 other phenotype-exposome globes!
http://www.chiragjpgroup.org/exposome_correlation
30. What nodes have the most correlations / have the most connections?
(“hubs of the network”)
(What factors are correlated with others the most?)
income...
AJE, 2015
31. Pulse rate
Eosinophils number
Lymphocyte number
Monocyte
Segmented neutrophils number
Blood 2,5-Dimethylfuran
Cadmium LeadCotinine
C-reactive protein
Floor, GFAAS
Protoporphyrin
Glycohemoglobin
Glucose, plasma
g-tocopherol
Hepatitis A Antibody
Homocysteine
Herpes I
Herpes II
Red cell distribution width
Alkaline phosphotase
Globulin
Glucose, serum
Gamma glutamyl transferase
Triglycerides
Blood Benzene
Blood 1,4-Dichlorobenzene
Blood Ethylbenzene
Blood Styrene
Blood Toluene
Blood m-/p-Xylene
White blood cell count
Mono-benzyl phthalate
3-fluorene
2-fluorene
3-phenanthrene
2-phenanthrene
1-pyrene
Cadmium, urine
Albumin, urine
Lead, urine
10
20
30
-0.3 -0.2 -0.1 0.0
Effect Size per 1SD of income/poverty ratio
-log10(pvalue)
overall income/poverty ratio effects (per 1SD)
validated results
Lower income associated with 43 of 330 (>13%) exposures
and biomarkers in the US population
Higher income: lower levels of biomarkers
AJE, 2015
(Another 23 associated with higher levels=20%)
32. Studying the Elusive Environment in Large Scale
Itispossiblethatmorethan50%ofcomplexdiseaserisk
isattributedtodifferencesinanindividual’senvironment.1
Airpollution,smoking,anddietaredocumentedenviron-
mental factors affecting health, yet these factors are but
a fraction of the “exposome,” the totality of the exposure
loadoccurringthroughoutaperson’slifetime.1
Investigat-
ing one or a handful of exposures at a time has led to a
highly fragmented literature of epidemiologic associa-
tions. Much of that literature is not reproducible, and se-
lectivereportingmaybeamajorreasonforthelackofre-
producibility. A new model is required to discover
environmental exposures associated with disease while
mitigating possibilities of selective reporting.
Toremedythelackofreproducibilityandconcernsof
validity, multiple personal exposures can be assessed si-
multaneously in terms of their association with a condi-
tion or disease of interest; the strongest associations can
then be tentatively validated in independent data sets
(eg, as done in references 2 and 3).2,3
The main advan-
tages of this process include the ability to search the list
ofexposuresandadjustformultiplicitysystematicallyand
reportalltheprobedassociationsinsteadofonlythemost
significant results. The term “environment-wide associa-
tion studies” (EWAS) has been used to describe this ap-
proach (an analogy to genome-wide association stud-
ies).Forexample,Wangetal4
screenedmorethan2000
chemicalsinserumtodiscoverendogenousexposuresas-
sociated with risk for cardiovascular disease.
Therearenotablehurdlesinanalyzing“big”environ-
mental data. These same problems affect epidemiology
of1-risk-factor-at-a-time,butinEWAStheirprevalencebe-
comes more clearly manifest at large scale. When study-
the EWAS vantage point, intervening on β-carotene
(Figure, D) seems a futile exercise given its complex rela-
tionship with other nutrients and pollutants.
Giventhiscomplexity,howcanstudiesofenvironmen-
talriskmoveforward?First,EWASanalysesshouldbeap-
pliedtomultipledatasets,andconsistencycanbeformally
examinedforallassessedcorrelations.Second,thetempo-
ral relationship between exposure and changes in health
parametersmayofferhelpfulhintsaboutwhichofthesig-
nalsaremorethansimplecorrelations.Third,standardized
adjustedanalyses,inwhichadjustmentsareperformedsys-
tematicallyandinthesamewayacrossmultipledatasets,
may also help. This is in stark contrast with the current
model,wherebymostepidemiologicstudiesusesingledata
setswithoutreplicationaswellasnon–time-dependentas-
sessments,andreportedadjustmentsaremarkedlydiffer-
entacrossreportsanddatasets,eventhoseperformedby
thesameteam(differentapproachesincreasevaliditybut
mustbereconciledandassimilated).
However, eventually for most environmental cor-
relates,theremaybeunsurpassabledifficultyestablish-
ing potential causal inferences based on observational
data alone. Factors that seem protective may some-
times be tested in randomized trials. The complexity of
the multiple correlations also highlights the challenge
thatinterveningtomodify1putativeriskfactoralsomay
inadvertently affect multiple other correlated factors.
Even when a seemingly simple intervention is tested in
randomizedtrials(affectingasingleriskfactoramongthe
manycorrelations),theinterventionisnotreallysimple.
In essence what is tested are multiple perturbations of
factors correlated with the one targeted for interven-
VIEWPOINT
Chirag J. Patel, PhD
Center for Biomedical
Informatics, Harvard
Medical School,
Boston, Massachusetts.
John P. A. Ioannidis,
MD, DSc
Stanford Prevention
Research Center,
Department of Health
Research and Policy,
Department of
Medicine, Stanford
University School of
Medicine, Stanford,
California, Department
of Statistics, Stanford
University School of
Humanities and
Sciences, Stanford,
California, and
Meta-Research
Innovation Center at
Stanford (METRICS),
Stanford, California.
Opinion
JAMA, 2014
JECH, 2014
Proc Symp Biocomp, 2015
How can we study the elusive environment in larger scale for
biomedical discovery?
Studying the Elusive Environment in Large Scale
Itispossiblethatmorethan50%ofcomplexdiseaserisk
isattributedtodifferencesinanindividual’senvironment.1
Airpollution,smoking,anddietaredocumentedenviron-
mental factors affecting health, yet these factors are but
a fraction of the “exposome,” the totality of the exposure
loadoccurringthroughoutaperson’slifetime.1
Investigat-
ing one or a handful of exposures at a time has led to a
highly fragmented literature of epidemiologic associa-
tions. Much of that literature is not reproducible, and se-
lectivereportingmaybeamajorreasonforthelackofre-
producibility. A new model is required to discover
environmental exposures associated with disease while
mitigating possibilities of selective reporting.
Toremedythelackofreproducibilityandconcernsof
validity, multiple personal exposures can be assessed si-
multaneously in terms of their association with a condi-
tion or disease of interest; the strongest associations can
then be tentatively validated in independent data sets
(eg, as done in references 2 and 3).2,3
The main advan-
tages of this process include the ability to search the list
ofexposuresandadjustformultiplicitysystematicallyand
reportalltheprobedassociationsinsteadofonlythemost
significant results. The term “environment-wide associa-
tion studies” (EWAS) has been used to describe this ap-
the EWAS vantage point, intervening on β-carotene
(Figure, D) seems a futile exercise given its complex rela-
tionship with other nutrients and pollutants.
Giventhiscomplexity,howcanstudiesofenvironmen-
talriskmoveforward?First,EWASanalysesshouldbeap-
pliedtomultipledatasets,andconsistencycanbeformally
examinedforallassessedcorrelations.Second,thetempo-
ral relationship between exposure and changes in health
parametersmayofferhelpfulhintsaboutwhichofthesig-
nalsaremorethansimplecorrelations.Third,standardized
adjustedanalyses,inwhichadjustmentsareperformedsys-
tematicallyandinthesamewayacrossmultipledatasets
may also help. This is in stark contrast with the current
model,wherebymostepidemiologicstudiesusesingledata
setswithoutreplicationaswellasnon–time-dependentas-
sessments,andreportedadjustmentsaremarkedlydiffer-
entacrossreportsanddatasets,eventhoseperformedby
thesameteam(differentapproachesincreasevaliditybut
mustbereconciledandassimilated).
However, eventually for most environmental cor-
relates,theremaybeunsurpassabledifficultyestablish-
ing potential causal inferences based on observationa
data alone. Factors that seem protective may some-
times be tested in randomized trials. The complexity of
VIEWPOINT
Chirag J. Patel, PhD
Center for Biomedical
Informatics, Harvard
Medical School,
Boston, Massachusetts.
John P. A. Ioannidis,
MD, DSc
Stanford Prevention
Research Center,
Department of Health
Research and Policy,
Department of
Medicine, Stanford
University School of
Medicine, Stanford,
California, Department
of Statistics, Stanford
University School of
Humanities and
Sciences, Stanford,
California, and
Meta-Research
Innovation Center at
Stanford (METRICS),
Stanford, California.
Opinion
High-throughputascertainmentofendogenousindicatorsofen-
vironmentalexposurethatmayreflecttheexposomeincreasinglyat-
tractattention,andtheirperformanceneedstobecarefullyevaluated.
These include chemical detection of indicators of exposure through
metabolomics, proteomics, and biosensors.7
Eventually, patterns of
US federally funded gene expression experiment data be d
itedinpublicrepositoriessuchastheGeneExpressionOmnibu
repositoryhasbeeninstrumentalindevelopmentoftechnolo
measurement of gene expression, data standardization, and
ofdatafordiscovery.JustaswiththeGeneExpressionOmnib
Figure. Correlation Interdependency Globes for 4 Environmental Exposures (Cotinine, Mercury, Cadmium, Trans-β-Carotene) in National Healt
Nutrition Examination Survey (NHANES) Participants, 2003-2004
A Serum cotinine B Serum total mercury C Serum cadmium D Serum trans-β-carotene
37 Total correlations 42 Total correlations 68 Total correlations 68 Total correlations
Negative correlation Positive correl
Infectious
agents
Pollutants
Nutrients
and vitamins
Demographic
attributes
Eachcorrelationinterdependencyglobeincludes317environmentalexposures
representedbythenodesaroundtheperipheryoftheglobe.Pairwisecorrelations
aredepictedbyedges(lines)betweenthenodeofinterest(arrowhead)andother
nodes.Correlationswithabsolutevaluesexceeding0.2areshown(stronge
Thesizeofeachnodeisproportionaltothenumberofedgesforanode,and
thicknessofeachedgeindicatesthemagnitudeofthecorrelation.
Opinion Viewpoint
•bioinformatics to connect exposome with phenome
•new ‘omics technologies to measure the exposome
•dense correlations
•reverse causality
•confounding
•(longitudinal) publicly available data
34. with Paul Avillach, Michael McDuffie, Jeremy Easton-Marks,
Cartik Saravanamuthu and the BD2K PIC-SURE team
40K participants
>1000 indicators of exposure
Data and API available now
http://nhanes.hms.harvard.edu
BD2K Patient-Centered Information Commons
NHANES exposome browser
35. Connecting Environmental Exposure with Disease:
Missing the “System” of Exposures?
E+ E-
diseased
non-
diseased
?
Exposed to many things, but do not assess the multiplicity.
Fragmented literature of associations.
Challenge to discover E associated with disease.
36. Example of fragmentation:
Is everything we eat associated with cancer?
Schoenfeld and Ioannidis, AJCN (2012)
50 random ingredients from
Boston Cooking School
Cookbook
Any associated with cancer?
FIGURE 1. Effect estimates reported in the literature by malignancy type (top) or ingredient (bottom). Only ingredients with $10 studie
outliers are not shown (effect estimates .10).
Of 50, 40 studied in a cancer risk
Weak statistical evidence:
non-replicated
inconsistent effects
non-standardized
41. JCE, 2015
Janus (two-faced) risk profile
Risk and significance depends on modeling scenario!
The Vibration of Effects: beware of the Janus effect
(both risk and protection?!)
“risk”“protection”
“significant”
Brittanica.com
44. P
We are many phenotypes simultaneously:
Can we better categorize these P?
Body Measures
Body Mass Index
Height
Blood pressure & fitness
Systolic BP
Diastolic BP
Pulse rate
VO2 Max
Metabolic
Glucose
LDL-Cholesterol
Triglycerides
Inflammation
C-reactive protein
white blood cell count
Kidney function
Creatinine
Sodium
Uric Acid
Liver function
Aspartate aminotransferase
Gamma glutamyltransferase
Aging
Telomere length
45. EWAS-derived phenotype-exposure association map:
A 2-D view of phenotype-exposure associations for re-
classification
PCB170
Glucose
BMI
Height
Cholesterol
β-carotene
folate
http://bit.ly.com/pemap
46. Creation of a phenotype-exposure association map:
A 2-D view of 83 phenotype by 252 exposure associations
> 0
< 0
Association Size:
Clusters of exposures associated with clusters of phenotypes?
252 biomarkers of exposure × 83 clinical trait phenotypes
NHANES 1999-2000, 2001-2002, 2005-2006
~21K regressions: replicated significant (FDR < 5%) in 2003-2004
adjusted by age, age2, sex, race, income, chronic disease
Hugues Aschard, JP Ioannidis
83phenotypes
252 exposures
53. Triglycerides
Total Cholesterol
LDL-cholesterol
Trunk Fat
Albumin, urine
Insulin
Total Fat
Head Circumference
Blood urea nitrogen
Albumin
Homocysteine
C-peptide: SI
C-reactive protein
Body Mass Index
Ferritin
Thigh Circumference
Maximal Calf Circumference
Direct HDL-Cholesterol
Total calcium
Total bilirubin
Red cell distribution width
Gamma glutamyl transferase
Mean cell volume
Mean cell hemoglobin
White blood cell count
Uric acid
Protoporphyrin
Hemoglobin
Total protein
Alkaline phosphotase
Waist Circumference
Hematocrit
Weight
Standing Height
1/Creatinine
Creatinine
Trunk Lean excl BMC
Methylmalonic acid
Triceps Skinfold
Lymphocyte number
Subscapular Skinfold
Total Lean excl BMC
Segmented neutrophils number
Lactate dehydrogenase LDH
Bone alkaline phosphotase
TIBC, Frozen Serum
Aspartate aminotransferase AST
Phosphorus
Lumber Pelvis BMD
Glycohemoglobin
Globulin
Chloride
Bicarbonate
Alanine aminotransferase ALT
60 sec. pulse:
Upper Leg Length
Total BMD
Potassium
Glucose, serum
Glucose, plasma
Red blood cell count
Lumber Spine BMD
Platelet count SI
MCHC
Osmolality
Monocyte number
mean systolic
Lymphocyte percent
Segmented neutrophils percent
Recumbent Length
Eosinophils number
Monocyte percent
Head BMD
mean diastolic
Prostate specific antigen ratio
60 sec HR
Basophils number
Sodium
PSA, free
Mean platelet volume
Eosinophils percent
PSA. total
Basophils percent
0 10 20 30 40
R^2 * 100
1 to 66 exposures identified for 81
phenotypes
Additive effect of E factors:
Describe < 20% of variability in P
(On average: 8%)
σ2
E?
54. Emerging technologies to ascertain exposome will enable
biomedical discovery
High-throughput E standards:
mitigate fragmented literature of associations
Confounding, reverse causality:
how to handle at large dimension?
e.g., EWASs in T2D, telomere length, and mortality
Facilitate G and E interaction investigations and
more precise definitions of P
56. Harvard HMS
Isaac Kohane
Susanne Churchill
Stan Shaw
Nathan Palmer
Jenn Grandfield
Sunny Alvear
Michal Preminger
Harvard Chan
Hugues Aschard
Francesca Dominici
Stanford
John Ioannidis
Atul Butte (UCSF)
U Queensland
Jian Yang
Peter Visscher
Cochrane
Belinda Burford
Chirag Lakhani
Adam Brown
Nam Pho
Danielle Rasooly
Arjun Manrai
Chirag J Patel
chirag@hms.harvard.edu
@chiragjp
www.chiragjpgroup.org
CDC/NCHS
Ajay Yesupriya
Imperial
Ioanna Tzoulaki
Paul Elliott
Lund (Sweden)
Jan Sundquist
Kristina Sundquist
NIH Common Fund
Big Data to Knowledge
Thanks...
Stefano Monti
David Scherr