SlideShare a Scribd company logo
1 of 20
Download to read offline
Physics, AI, and the
Environment
Arjun (Raj) Manrai, Ph.D.

National Academies of Science, Engineering, and Medicine
June 6, 2019
@arjunmanrai

Arjun_Manrai@hms.harvard.edu

Harvard Medical School

Computational Health Informatics Program,

Boston Children’s Hospital
Supervised machine learning performs well with:

(a) Lots of clean, labeled data [Ex: ImageNet]

(b) Efficient compute, time [Ex: GPUs]

(c) Algorithmic advancements [Ex: Dropout]
Do we have these for environmental health
and the exposome?
Red: positive ρ

Blue: negative ρ

thickness: |ρ|
for each pair of E:

Spearman ρ

(575 factors: 81,937 correlations)
Exposures are complex: densely correlated,
time-varying, and often difficult to measure.
permuted data to produce

“null ρ”

sought replication in > 1
cohort
Patel & Manrai PSB 2015
Exposures can have very different timescales 

throughout life
Athersuch Bioanalysis 2012
Cumulative (cadmium, PCB)
Constant, but excreted (phenols, vitamins)
Intervention (drugs)
Seasonal (allergen)
In-utero
Not shown: Diurnal
AI has been successful ‘out of the box’ for many
problems in health, but it is unlikely to be so for
many in environmental health and exposome.
Pervasive issues around measurement and
reproducibility are likely to limit application.
Reproducibility, old and new
“Non-reproducible single
occurrences are of no
significance to science.”
—Karl Popper
“It’s basically collecting lots of
variables and then playing with your
data until you find something that
counts as statistically significant but
is probably meaningless.”
—John Oliver
Let’s take a deeper look by starting with a very
powerful ML method for the exposome…linear
regression!
Suppose we want to associate lead
exposure and C-reactive protein (CRP), a
blood marker of inflammation
Regression gives us:

• Variance explained

• Interpretable coefficients

• Predictions

• Uncertainty
Step 2: Build an unadjusted model
Step 1: Get data
Step 3: Adjust sex and BMI, and then also race, 

and SES, and then also smoking minus SES…
NHANES <- readRDS(‘NHANES_NAS.rds’)
lm(CRP ~ lead, data= NHANES) P-value = 0.12
P-value = 0.053
lm(CRP ~ lead + factor(sex) + bmi, data= NHANES)
lm(CRP ~ lead + factor(sex) + bmi + factor(race), data= NHANES)
…..
“trending towards significance”
Step 4: Aha moment! Filter to a more precise group
Step 5….

Step 6…

Step 7…
Bingo! p < 0.05
Publish. 

Tenure.
NHANES2 <- NHANES %>% filter(age > 25 & age < 35)

lm(CRP ~ lead + factor(sex) + bmi, data= NHANES2)
pcb
b-carotene
C-reactive protein
cotinine
Patel et al. JCE 2015
Formalize
and scale
with the
Vibration of
Effects
Janus effect
When applying machine learning, we have 

even more analytic choices:
• Hyperparameters (e.g. learning rate)

• Model architecture (e.g. number of hidden layers)

• Declaration of improvement (e.g. delta AUC = 0.001)

• Splits (e.g. training/val/test splits)

• Many analysts (e.g. Kaggle competitions)
…What can we do?
When do we address multiplicity?
Almost Always Almost Never
We can look to two fields:

(1) Genomics

(2) Physics
Comparison #1: Genomics
Major study design change in human genetics research:
Candidate gene studies to genome wide association studies.
Creation of a phenotype-exposure association map:
A 2-D view of 158 phenotype by 510 exposure associations
> 0
< 0
Association Size:
510 E exposure and diet indicators × 158 clinical trait phenotypes 

NHANES 1999-2000, 2001-2002, 2005-2006, …, 2011-2012 (8)

Median N: 150-5000 per survey 

~67,281 E-P associations!
significant associations (FDR < 5%)

adjusted by age, age2, sex, race, income

Manrai et al. 2019
158phenotypes
510 exposures
Comparison #1: Genomics (continued)
Large cohorts will expose massive mis-misspecification,
confounding, and correlation amongst exposures.
Manrai, Ioannidis, Patel. AJE 2019
Comparison #1: Genomics (Continued)
Comparison #2: Physics
Evidentiary standards in particle physics and the power of 

large, team science.
Discovery of the 

Higgs Boston
Scientific American
Comparison #2: Physics (continued)
Nature
Comparison #2: Physics (continued)
The central challenges of applying AI in environmental
health are not uniquely AI challenges. They are:

(1) Data: environmental/exposure data are time-varying,
densely correlated, and often hard to measure
[potential solutions: new measurement platforms,
consortia]

(2) Analytic choices, multiplicity

[potential solutions: pre-registration, -WAS, blinding]

(3) Often extreme missingness in data [potential
solutions: new imputation methods]

Useful comparisons in high-throughput genomics and
communal science in physics.
Summary

More Related Content

Similar to Arjun Manrai - National Academies Talk - June 6, 2019

Capstone poster gail_falcione (1)
Capstone poster gail_falcione (1)Capstone poster gail_falcione (1)
Capstone poster gail_falcione (1)Gail Falcione
 
Bioinformatics Strategies for Exposome 100416
Bioinformatics Strategies for Exposome 100416Bioinformatics Strategies for Exposome 100416
Bioinformatics Strategies for Exposome 100416Chirag Patel
 
Data analytics to support exposome research course slides
Data analytics to support exposome research course slidesData analytics to support exposome research course slides
Data analytics to support exposome research course slidesChirag Patel
 
Multi-trait modeling in polygenic scores
Multi-trait modeling in polygenic scoresMulti-trait modeling in polygenic scores
Multi-trait modeling in polygenic scoresYosuke Tanigawa
 
How to analyse large data sets
How to analyse large data setsHow to analyse large data sets
How to analyse large data setsimprovemed
 
Chirag patel unite for sight 041418
Chirag patel unite for sight 041418Chirag patel unite for sight 041418
Chirag patel unite for sight 041418Chirag Patel
 
Talk on reproducibility in EEG research
Talk on reproducibility in EEG researchTalk on reproducibility in EEG research
Talk on reproducibility in EEG researchDorothy Bishop
 
Physical Activity: Analysis of CADM2
Physical Activity: Analysis of CADM2Physical Activity: Analysis of CADM2
Physical Activity: Analysis of CADM2TingtingThompson
 
Informatics and data analytics to support for exposome-based discovery
Informatics and data analytics to support for exposome-based discoveryInformatics and data analytics to support for exposome-based discovery
Informatics and data analytics to support for exposome-based discoveryChirag Patel
 
Thomas S. Price, Ph.D. Career resume, Jan 2017
Thomas S. Price, Ph.D. Career resume, Jan 2017Thomas S. Price, Ph.D. Career resume, Jan 2017
Thomas S. Price, Ph.D. Career resume, Jan 2017Tom Price
 
Methods to enhance the validity of precision guidelines emerging from big data
Methods to enhance the validity of precision guidelines emerging from big dataMethods to enhance the validity of precision guidelines emerging from big data
Methods to enhance the validity of precision guidelines emerging from big dataChirag Patel
 
A Critical Assessment Of Mus Musculus Gene Function Prediction Using Integrat...
A Critical Assessment Of Mus Musculus Gene Function Prediction Using Integrat...A Critical Assessment Of Mus Musculus Gene Function Prediction Using Integrat...
A Critical Assessment Of Mus Musculus Gene Function Prediction Using Integrat...Sara Alvarez
 
00047 Jc Silva 2005 Anal Chem V77p2187
00047 Jc Silva 2005 Anal Chem V77p218700047 Jc Silva 2005 Anal Chem V77p2187
00047 Jc Silva 2005 Anal Chem V77p2187jcruzsilva
 
Montgomery expression
Montgomery expressionMontgomery expression
Montgomery expressionmorenorossi
 
Day2 145pm Crawford
Day2 145pm CrawfordDay2 145pm Crawford
Day2 145pm CrawfordSean Paul
 
Repurposing large datasets to dissect exposomic (and genomic) contributions i...
Repurposing large datasets to dissect exposomic (and genomic) contributions i...Repurposing large datasets to dissect exposomic (and genomic) contributions i...
Repurposing large datasets to dissect exposomic (and genomic) contributions i...Chirag Patel
 
Research Statement Chien-Wei Lin
Research Statement Chien-Wei LinResearch Statement Chien-Wei Lin
Research Statement Chien-Wei LinChien-Wei Lin
 
NSF Northeast Hub Big Data Workshop
NSF Northeast Hub Big Data WorkshopNSF Northeast Hub Big Data Workshop
NSF Northeast Hub Big Data WorkshopChirag Patel
 
Majority Voting Approach for the Identification of Differentially Expressed G...
Majority Voting Approach for the Identification of Differentially Expressed G...Majority Voting Approach for the Identification of Differentially Expressed G...
Majority Voting Approach for the Identification of Differentially Expressed G...csandit
 

Similar to Arjun Manrai - National Academies Talk - June 6, 2019 (20)

Capstone poster gail_falcione (1)
Capstone poster gail_falcione (1)Capstone poster gail_falcione (1)
Capstone poster gail_falcione (1)
 
Bioinformatics Strategies for Exposome 100416
Bioinformatics Strategies for Exposome 100416Bioinformatics Strategies for Exposome 100416
Bioinformatics Strategies for Exposome 100416
 
Data analytics to support exposome research course slides
Data analytics to support exposome research course slidesData analytics to support exposome research course slides
Data analytics to support exposome research course slides
 
Multi-trait modeling in polygenic scores
Multi-trait modeling in polygenic scoresMulti-trait modeling in polygenic scores
Multi-trait modeling in polygenic scores
 
How to analyse large data sets
How to analyse large data setsHow to analyse large data sets
How to analyse large data sets
 
Rodriguez_UROC_Final_Poster
Rodriguez_UROC_Final_PosterRodriguez_UROC_Final_Poster
Rodriguez_UROC_Final_Poster
 
Chirag patel unite for sight 041418
Chirag patel unite for sight 041418Chirag patel unite for sight 041418
Chirag patel unite for sight 041418
 
Talk on reproducibility in EEG research
Talk on reproducibility in EEG researchTalk on reproducibility in EEG research
Talk on reproducibility in EEG research
 
Physical Activity: Analysis of CADM2
Physical Activity: Analysis of CADM2Physical Activity: Analysis of CADM2
Physical Activity: Analysis of CADM2
 
Informatics and data analytics to support for exposome-based discovery
Informatics and data analytics to support for exposome-based discoveryInformatics and data analytics to support for exposome-based discovery
Informatics and data analytics to support for exposome-based discovery
 
Thomas S. Price, Ph.D. Career resume, Jan 2017
Thomas S. Price, Ph.D. Career resume, Jan 2017Thomas S. Price, Ph.D. Career resume, Jan 2017
Thomas S. Price, Ph.D. Career resume, Jan 2017
 
Methods to enhance the validity of precision guidelines emerging from big data
Methods to enhance the validity of precision guidelines emerging from big dataMethods to enhance the validity of precision guidelines emerging from big data
Methods to enhance the validity of precision guidelines emerging from big data
 
A Critical Assessment Of Mus Musculus Gene Function Prediction Using Integrat...
A Critical Assessment Of Mus Musculus Gene Function Prediction Using Integrat...A Critical Assessment Of Mus Musculus Gene Function Prediction Using Integrat...
A Critical Assessment Of Mus Musculus Gene Function Prediction Using Integrat...
 
00047 Jc Silva 2005 Anal Chem V77p2187
00047 Jc Silva 2005 Anal Chem V77p218700047 Jc Silva 2005 Anal Chem V77p2187
00047 Jc Silva 2005 Anal Chem V77p2187
 
Montgomery expression
Montgomery expressionMontgomery expression
Montgomery expression
 
Day2 145pm Crawford
Day2 145pm CrawfordDay2 145pm Crawford
Day2 145pm Crawford
 
Repurposing large datasets to dissect exposomic (and genomic) contributions i...
Repurposing large datasets to dissect exposomic (and genomic) contributions i...Repurposing large datasets to dissect exposomic (and genomic) contributions i...
Repurposing large datasets to dissect exposomic (and genomic) contributions i...
 
Research Statement Chien-Wei Lin
Research Statement Chien-Wei LinResearch Statement Chien-Wei Lin
Research Statement Chien-Wei Lin
 
NSF Northeast Hub Big Data Workshop
NSF Northeast Hub Big Data WorkshopNSF Northeast Hub Big Data Workshop
NSF Northeast Hub Big Data Workshop
 
Majority Voting Approach for the Identification of Differentially Expressed G...
Majority Voting Approach for the Identification of Differentially Expressed G...Majority Voting Approach for the Identification of Differentially Expressed G...
Majority Voting Approach for the Identification of Differentially Expressed G...
 

Recently uploaded

SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxkessiyaTpeter
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Lokesh Kothari
 
DIFFERENCE IN BACK CROSS AND TEST CROSS
DIFFERENCE IN  BACK CROSS AND TEST CROSSDIFFERENCE IN  BACK CROSS AND TEST CROSS
DIFFERENCE IN BACK CROSS AND TEST CROSSLeenakshiTyagi
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxAleenaTreesaSaji
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPirithiRaju
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...RohitNehra6
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)Areesha Ahmad
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRDelhi Call girls
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptxRajatChauhan518211
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfSumit Kumar yadav
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)Areesha Ahmad
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​kaibalyasahoo82800
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksSérgio Sacani
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...Sérgio Sacani
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTSérgio Sacani
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfSumit Kumar yadav
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoSérgio Sacani
 

Recently uploaded (20)

SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
 
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
DIFFERENCE IN BACK CROSS AND TEST CROSS
DIFFERENCE IN  BACK CROSS AND TEST CROSSDIFFERENCE IN  BACK CROSS AND TEST CROSS
DIFFERENCE IN BACK CROSS AND TEST CROSS
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptx
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptx
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdf
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdf
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 

Arjun Manrai - National Academies Talk - June 6, 2019

  • 1. Physics, AI, and the Environment Arjun (Raj) Manrai, Ph.D. National Academies of Science, Engineering, and Medicine June 6, 2019 @arjunmanrai Arjun_Manrai@hms.harvard.edu Harvard Medical School Computational Health Informatics Program, Boston Children’s Hospital
  • 2. Supervised machine learning performs well with: (a) Lots of clean, labeled data [Ex: ImageNet] (b) Efficient compute, time [Ex: GPUs] (c) Algorithmic advancements [Ex: Dropout] Do we have these for environmental health and the exposome?
  • 3. Red: positive ρ Blue: negative ρ thickness: |ρ| for each pair of E: Spearman ρ (575 factors: 81,937 correlations) Exposures are complex: densely correlated, time-varying, and often difficult to measure. permuted data to produce “null ρ” sought replication in > 1 cohort Patel & Manrai PSB 2015
  • 4. Exposures can have very different timescales throughout life Athersuch Bioanalysis 2012 Cumulative (cadmium, PCB) Constant, but excreted (phenols, vitamins) Intervention (drugs) Seasonal (allergen) In-utero Not shown: Diurnal
  • 5. AI has been successful ‘out of the box’ for many problems in health, but it is unlikely to be so for many in environmental health and exposome. Pervasive issues around measurement and reproducibility are likely to limit application.
  • 6. Reproducibility, old and new “Non-reproducible single occurrences are of no significance to science.” —Karl Popper “It’s basically collecting lots of variables and then playing with your data until you find something that counts as statistically significant but is probably meaningless.” —John Oliver
  • 7. Let’s take a deeper look by starting with a very powerful ML method for the exposome…linear regression! Suppose we want to associate lead exposure and C-reactive protein (CRP), a blood marker of inflammation Regression gives us: • Variance explained • Interpretable coefficients • Predictions • Uncertainty
  • 8. Step 2: Build an unadjusted model Step 1: Get data Step 3: Adjust sex and BMI, and then also race, and SES, and then also smoking minus SES… NHANES <- readRDS(‘NHANES_NAS.rds’) lm(CRP ~ lead, data= NHANES) P-value = 0.12 P-value = 0.053 lm(CRP ~ lead + factor(sex) + bmi, data= NHANES) lm(CRP ~ lead + factor(sex) + bmi + factor(race), data= NHANES) ….. “trending towards significance”
  • 9. Step 4: Aha moment! Filter to a more precise group Step 5…. Step 6… Step 7… Bingo! p < 0.05 Publish. Tenure. NHANES2 <- NHANES %>% filter(age > 25 & age < 35) lm(CRP ~ lead + factor(sex) + bmi, data= NHANES2)
  • 10. pcb b-carotene C-reactive protein cotinine Patel et al. JCE 2015 Formalize and scale with the Vibration of Effects Janus effect
  • 11. When applying machine learning, we have even more analytic choices: • Hyperparameters (e.g. learning rate) • Model architecture (e.g. number of hidden layers) • Declaration of improvement (e.g. delta AUC = 0.001) • Splits (e.g. training/val/test splits) • Many analysts (e.g. Kaggle competitions) …What can we do?
  • 12. When do we address multiplicity? Almost Always Almost Never
  • 13. We can look to two fields: (1) Genomics (2) Physics
  • 14. Comparison #1: Genomics Major study design change in human genetics research: Candidate gene studies to genome wide association studies.
  • 15. Creation of a phenotype-exposure association map: A 2-D view of 158 phenotype by 510 exposure associations > 0 < 0 Association Size: 510 E exposure and diet indicators × 158 clinical trait phenotypes NHANES 1999-2000, 2001-2002, 2005-2006, …, 2011-2012 (8) Median N: 150-5000 per survey ~67,281 E-P associations! significant associations (FDR < 5%) adjusted by age, age2, sex, race, income Manrai et al. 2019 158phenotypes 510 exposures Comparison #1: Genomics (continued)
  • 16. Large cohorts will expose massive mis-misspecification, confounding, and correlation amongst exposures. Manrai, Ioannidis, Patel. AJE 2019 Comparison #1: Genomics (Continued)
  • 17. Comparison #2: Physics Evidentiary standards in particle physics and the power of large, team science. Discovery of the Higgs Boston
  • 18. Scientific American Comparison #2: Physics (continued)
  • 20. The central challenges of applying AI in environmental health are not uniquely AI challenges. They are: (1) Data: environmental/exposure data are time-varying, densely correlated, and often hard to measure [potential solutions: new measurement platforms, consortia] (2) Analytic choices, multiplicity [potential solutions: pre-registration, -WAS, blinding] (3) Often extreme missingness in data [potential solutions: new imputation methods] Useful comparisons in high-throughput genomics and communal science in physics. Summary